List of Apache PySpark Customers
Wilmington, 19801, DE,
United States
Since 2010, our global team of researchers has been studying Apache PySpark customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Apache PySpark for API Management from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Apache PySpark for API Management include: NextEra Energy, a United States based Utilities organisation with 16800 employees and revenues of $24.75 billion, Synchrony, a United States based Banking and Financial Services organisation with 20000 employees and revenues of $16.13 billion and many others.
Contact us if you need a completed and verified list of companies using Apache PySpark, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the software purchases.
The Apache PySpark customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight | Insight Source |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
NextEra Energy | Utilities | 16800 | $24.8B | United States | Apache Software | Apache PySpark | API Management | 2018 | n/a | In 2018, NextEra Energy deployed Apache PySpark as part of a Big Data analytics build within its IT organization, establishing a cloud native analytics foundation to process IoT sensor telemetry from Florida solar assets. A contract Cloud Engineer from ProTek Consulting led the end to end design and deployment on AWS, aligning the effort with corporate migration sequencing while enabling departmental analytics ahead of broader schedules. Apache PySpark was embedded in the ETL layer to perform scalable transformations as sensor data moved into the cloud, with ingestion and orchestration architecture using AWS DMS, AWS Glue, Amazon S3, and AWS Lambda to stage and transform records before landing in Amazon Redshift. Infrastructure provisioning was automated through AWS CloudFormation to ensure repeatability across environments, and ETL jobs and data pipelines were instrumented for operational visibility. The implementation integrated Amazon Redshift with both Power BI via ODBC and JDBC connectors and with Amazon QuickSight to enable a hybrid BI strategy that supported centralized corporate reporting alongside agile internal dashboards. Security and governance controls were implemented using IAM policies, VPC isolation, and KMS encryption, and the solution maintained close alignment with on premises network, data, and compliance stakeholders to preserve hybrid operational continuity. Operational governance included automated provisioning workflows, observability using Amazon CloudWatch to track ETL job health, data latency, and Redshift query performance, and a structured handoff to IT and analytics teams. The build reduced infrastructure setup time by 25 percent and improved issue detection and incident response by 40 percent as reported by the implementation team, and the solution served as a reference model for subsequent departmental cloud initiatives. | |
|
|
Synchrony | Banking and Financial Services | 20000 | $16.1B | United States | Apache Software | Apache PySpark | API Management | 2018 | n/a | In 2018, Synchrony implemented Apache PySpark. The implementation positioned Apache PySpark as the primary engine for large scale ETL and big data processing, Apps Category . Implementation scope centered on designing and developing scalable ETL pipelines using AWS Glue, PySpark, and SQL, with Python-based orchestration using AWS Lambda and Step Functions. Functional capabilities implemented included JSON encoding and decoding with PySpark to transform semi-structured data into analytics-ready tables, automated ingestion and validation pipelines, and data validation frameworks built with PySpark and Pandas. Integrations and operational architecture leveraged AWS services including S3 for landing and persistent storage, Glue for cataloging and ETL orchestration, Redshift for analytic modeling, EMR for Spark-based batch processing, IAM and KMS for security, and CloudWatch for monitoring. Migration work included moving on-premises DB2 datasets to Amazon S3 using AWS Glue and PySpark, and Redshift schema design used distribution keys, sort keys, and materialized views to improve query execution. Governance and operational controls emphasized secure access and compliance through IAM role management, S3 bucket policies, and KMS encryption, while operationalizing monitoring and incident workflows with CloudWatch, JIRA, and ServiceNow in an Agile Scrum delivery model. The Apache PySpark implementation supported cross-functional data engineering and analytics teams, and included automated source file ingestion, data quality checks, and cleanup workflows that reduced manual intervention and addressed performance bottlenecks through SQL and ETL optimization. |
Buyer Intent: Companies Evaluating Apache PySpark
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated | ||
|---|---|---|---|---|---|---|---|---|
| No data found | ||||||||