List of Amazon Elastic MapReduce (EMR) Customers
Seattle, 98109-5210, WA,
United States
Since 2010, our global team of researchers has been studying Amazon Elastic MapReduce (EMR) customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Amazon Elastic MapReduce (EMR) for Extract, Transform, and Load (ETL) from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Amazon Elastic MapReduce (EMR) for Extract, Transform, and Load (ETL) include: JPMorgan Chase, a United States based Banking and Financial Services organisation with 317233 employees and revenues of $180.60 billion, Pfizer, a United States based Life Sciences organisation with 81000 employees and revenues of $63.63 billion, Goldman Sachs, a United States based Banking and Financial Services organisation with 48300 employees and revenues of $53.51 billion, Triton Health System, a United States based Insurance organisation with 16070 employees and revenues of $36.26 billion, Fannie Mae, a United States based Banking and Financial Services organisation with 7000 employees and revenues of $30.85 billion and many others.
Contact us if you need a completed and verified list of companies using Amazon Elastic MapReduce (EMR), including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the software purchases.
The Amazon Elastic MapReduce (EMR) customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Ameriprise Financial | Banking and Financial Services | 13800 | $15.5B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2020 | n/a |
In 2020 Ameriprise Financial deployed Amazon Elastic MapReduce (EMR) as a core component of its Big Data processing and analytics platform to support enterprise data modeling, reporting, and machine learning pipelines. The implementation emphasized cloud-native batch and streaming workloads, using Amazon Elastic MapReduce (EMR) to run Spark, MapReduce, and Spark Streaming jobs that processed web server logs and other ingestion streams stored in Amazon S3.
The EMR deployment was configured to support data lake and data warehouse patterns, with ETL and transformation logic implemented in Python and Spark, and orchestration scheduled via AWS Data Pipeline, Airflow, and Oozie for daily, weekly, and monthly job cadences. Serverless functions were introduced using AWS Lambda with assigned IAM roles and triggers from SQS and SNS to kick off Python-based processing steps and to migrate ETL outputs into AWS Glue and Amazon Athena for ad hoc querying.
Integrations with core AWS services were explicit, including Amazon S3 for landing and staging, Redshift for analytical warehousing, Glue and Athena for serverless catalog and query services, and SageMaker and Elasticsearch for model development and indexed analytics. The implementation also tied into Kafka and Storm for real-time ingestion, HDFS and HBase for persistent big data stores on the Hadoop stack, and downstream visualization via Power BI and SAS Visual Analytics.
Governance and operational controls included designed AWS landing zones with IAM and VPC considerations, data governance and profiling workflows, integration testing strategies for ETL jobs, and a CI CD pipeline using Docker and GitHub for deployment automation. The program delivered both infrastructural modernization for data science and reporting workloads and explicit performance tuning on the warehouse tier, with Redshift optimization noted to enable queries to perform up to 100x faster for Power BI and SAS Visual Analytics.
|
|
|
B3 Brazil | Banking and Financial Services | 2889 | $2.0B | Brazil | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2016 | n/a |
In 2016, B3 Brazil implemented Amazon Elastic MapReduce (EMR) in the category to operationalize Hadoop and Spark processing for core data engineering workflows. The EMR deployment was positioned to host large-scale ETL and analytic jobs that previously ran on on-premise Hadoop tooling, aligning the company, Amazon Elastic MapReduce (EMR), data engineering function.
Implementation work focused on provisioning managed cluster infrastructure for batch and interactive processing, with configuration for Hive, Spark SQL, and MapReduce execution engines. Development practices included Java-based integrations using the Hadoop and Hive APIs and PySpark workloads, reflecting the team’s existing skill set in Java development and Spark SQL.
Data ingestion and processing pipelines integrated explicit technologies from the environment, including Sqoop and Flume for ingestion, Hive for ELT, and Impala for ad-hoc queries, while Sentry and Kerberos were used for security and authorization controls. The environment interoperated with the Cloudera platform components that were part of the existing stack, enabling reuse of ETL patterns, metadata practices, and query tools during the EMR rollout.
Operational ownership rested with B3’s data engineering and platform teams in Brazil, covering architecture, cluster maintenance, and workload scheduling. Governance work emphasized secure authentication and authorization through Kerberos and Sentry integration, and the implementation preserved existing ELT and data ingestion workflows while shifting execution to Amazon Elastic MapReduce (EMR).
|
|
|
Credit One Bank | Banking and Financial Services | 2500 | $1.5B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2017 | n/a |
In 2017, Credit One Bank implemented Amazon Elastic MapReduce (EMR), Apps Category . The EMR deployment served as the bank's cluster compute layer for large scale Spark and Hadoop workloads, supporting PySpark based ETL and analytic pipelines developed by the data engineering team in Las Vegas, NV.
The implementation centered on Spark technologies, specifically PySpark, Spark SQL, and Spark MLlib, with Spark Streaming used to convert streaming input into micro batches for downstream processing. EMR clusters were provisioned alongside S3 for durable data lake storage and EC2 for worker nodes, with AWS CloudFormation templates used to codify multi tier application deployments and to enforce availability, fault tolerance, and auto scaling behavior.
Integrations were explicit and modular, EMR ran Spark and Hive jobs that consumed data landed via Kafka REST API and Apache NiFi workflows, while Sqoop was used for relational database ingest into the Hadoop file system. AWS Glue crawler jobs were used for data cataloging and building ETL pipelines to target data marts, with downstream storage and analytics integrations including Amazon Redshift, Snowflake, and Tableau for BI consumption.
Operational governance and delivery followed Agile and Scrum methodology for project and team management, with PySpark applications and GitHub used for development lifecycle control. Monitoring and access control were implemented using CloudWatch and IAM respectively, and the surrounding AWS service footprint included Lambda, SNS, SQS, RDS, DynamoDB and other platform services as part of the runtime architecture.
|
|
|
Fannie Mae | Banking and Financial Services | 7000 | $30.9B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2018 | n/a |
In 2018, Fannie Mae implemented Amazon Elastic MapReduce (EMR) to support Extract, Transform, and Load (ETL) workloads for enterprise data integration and analytics. The engagement centered on building scalable ETL pipelines using EMR and Spark, with work performed by AWS data development resources to meet requirements gathered from business development stakeholders and downstream analytics teams.
Implementation focused on modular ETL development using PySpark on Amazon Elastic MapReduce (EMR), SQL based transformations, and data modeling in star and snowflake schemas. Functional capabilities included ingestion and parsing of structured and semi structured sources, PySpark runtime tuning with partitioning and salting strategies, edge case testing, and pipeline promotion across developing, staging, and production environments.
The EMR deployment integrated with a broad AWS and data ecosystem, explicitly including Amazon S3 for raw and processed storage, Amazon Redshift for analytic consumption, Snowflake for warehousing and stored procedure based normalization, and Teradata and Hadoop sources for legacy data lift. Operational tooling and metadata were tied to PostgreSQL for pipeline documentation, Bitbucket for code merges, Confluence for runbook updates, Tableau for interactive dashboards, and AWS services such as Glue, Athena, DMS and Schema Conversion Tool as adjunct ETL and migration utilities.
Governance practices emphasized documented pipelines and collaboration workflows, merging completed code to Bitbucket, updating Confluence entries, notifying stakeholders on job status and resolved errors, and training technical staff on solution access and usage. Operational ownership included scheduled maintenance of EFS and EMR clusters, routine debugging and performance reviews to accelerate production stability, and coordinated validation with the Software Solution and MTD developer teams.
Outcomes explicitly captured included creation of Tableau visualizations and dashboards that enabled business users and executives to explore product usage and customer trends, alongside targeted testing and PySpark performance tuning to avoid downtime and improve pipeline reliability. The Fannie Mae Amazon Elastic MapReduce (EMR) Extract, Transform, and Load (ETL) implementation delivered an integrated, documented platform for downstream analytics and operational reporting.
|
|
|
Goldman Sachs | Banking and Financial Services | 48300 | $53.5B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2020 | n/a |
In 2020 Goldman Sachs deployed Amazon Elastic MapReduce (EMR) as part of its Extract, Transform, and Load (ETL) stack to support batch processing and enterprise data engineering workflows. The EMR implementation included provisioning and ongoing maintenance of Hadoop clusters on AWS Elastic MapReduce and development of Spark applications to ingest and transform data from relational databases and Hive sources.
Configuration and functional modules focused on EMR-managed components including Hive, HBase, and Spark, with Amazon S3 serving as the primary data lake for raw and staged datasets. ETL pipelines were implemented using EMR for heavy batch processing, Amazon Glue Studio for data integration and cataloging, AWS Lambda for serverless pipeline steps, and Apache Airflow DAGs to orchestrate weblog extraction and scheduled workflows.
Integrations spanned multiple AWS services explicitly used in the program, including Amazon Redshift for storing transformed data, AWS Athena for querying structured S3 data, Amazon Kinesis for streaming inputs, EC2 for compute workloads, and the Glue Catalog for schema management. Downstream analytic and reporting consumption used Tableau and Alteryx, while operational data movement included Spark jobs writing to Cassandra tables; development and deployment workflows used Git, JIRA, Maven or ANT with Jenkins, and Control M for job scheduling.
Operational scope emphasized the data engineering organization collaborating with business analysts and data scientists to provision virtualized, queryable data sets, with responsibilities covering ETL pipeline design, data standardization, Python automation scripts using libraries such as Boto3, Pandas, and NumPy, and advanced analytics support. Governance relied on Glue Catalog metadata, Airflow scheduling and version control to maintain pipeline reproducibility and traceability, and AWS cloud security practices to secure datasets and processing environments.
|
|
|
|
Banking and Financial Services | 317233 | $180.6B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2020 | n/a |
|
|
|
|
Life Sciences | 81000 | $63.6B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2017 | n/a |
|
|
|
|
Insurance | 16070 | $36.3B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2022 | n/a |
|
|
|
|
Communications | 1300 | $1.0B | Australia | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2017 | n/a |
|
|
|
|
Retail | 13500 | $12.2B | United States | Amazon Web Services (AWS) | Amazon Elastic MapReduce (EMR) | Extract, Transform, and Load (ETL) | 2021 | n/a |
|
Buyer Intent: Companies Evaluating Amazon Elastic MapReduce (EMR)
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated | ||
|---|---|---|---|---|---|---|---|---|
| No data found | ||||||||