List of Apache Hadoop Customers
Wilmington, 19801, DE,
United States
Since 2010, our global team of researchers has been studying Apache Hadoop customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Apache Hadoop for Database Management from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Apache Hadoop for Database Management include: Walmart, a United States based Retail organisation with 2100000 employees and revenues of $681.00 billion, Apple, a United States based Manufacturing organisation with 166000 employees and revenues of $416.16 billion, United Healthcare, a United States based Insurance organisation with 400000 employees and revenues of $400.28 billion, McKesson, a United States based Professional Services organisation with 45000 employees and revenues of $400.00 billion, CVS Health, a United States based Healthcare organisation with 219000 employees and revenues of $372.81 billion and many others.
Contact us if you need a completed and verified list of companies using Apache Hadoop, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the IaaS software purchases.
The Apache Hadoop customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of IaaS software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
2K Games | Professional Services | 3000 | $930M | United States | Apache Software | Apache Hadoop | Database Management | 2017 | n/a |
In 2017, 2K Games deployed Apache Hadoop to establish a centralized Big Data Infrastructure supporting the companys analytics and data engineering functions. Apache Hadoop was positioned as the primary distributed storage and processing backbone to consolidate ingestion, transformation, and manipulation of data from across the organization for the analytics team.
The implementation centered on Hadoop as the storage and compute foundation, paired with category aligned processing frameworks to host ETL and ELT pipelines that ingest from REST APIs and internal systems. The analytics engineering responsibilities emphasized building reusable, scalable data models, managing ETL processes into data warehouses, scheduling jobs to pull data from APIs, and maintaining internal databases used for reporting and dashboarding.
Integrations were explicitly aligned to the existing toolset documented in hiring and engineering notes, including AWS EC2 and S3 for cloud compute and object storage, Amazon EMR and Spark for distributed processing, and Redshift and columnar storage for analytical warehousing. The deployment also interfaced with SQL and NoSQL databases, Pentaho style ETL patterns, and visualization backends such as Tableau to support dashboarding and access control for analytics consumers.
Governance and operational changes focused on instituting continuous integration pipelines, enforcing software development best practices on the analytics team, and formalizing access control and job scheduling for reporting workloads. Implementation emphasis was on maintainability and scalability, enabling data engineers and data scientists to build, test, and deploy pipelines while centralizing data ingestion, transformation, and model orchestration within the Big Data Infrastructure.
|
|
|
3M | Manufacturing | 61500 | $24.6B | United States | Apache Software | Apache Hadoop | Database Management | 2012 | n/a |
In 2012, 3M deployed Apache Hadoop as Big Data Infrastructure to underpin the Data Science Lab's clinical data analytics platform. The Data Science Lab was responsible for leading independent R&D projects in machine learning and advanced statistical analysis of healthcare data, extracting knowledge from large and diverse datasets and building predictive models and novel algorithms for production deployment on 3M's clinical data analytics platform.
Apache Hadoop was configured to provide distributed storage and scalable batch processing, forming the core data lake and serving as persistent staging for analytic workloads. Workflows combined Apache Hadoop with Apache Spark for in-memory processing, while standard language runtimes such as Python, R and Java were used for model development and SQL was used for data interrogation. Notebook environments including Jupyter Notebook supported exploratory analysis and iterative model building.
Integrations explicitly reflected the team’s technical experiences, including Spark and developer tooling such as PyCharm and RStudio, with Git and Gerrit used for source control and code review to support reproducible model pipelines. The environment operated alongside cloud and hybrid analytics experiences cited by the lab, including AWS, Databricks, Google Cloud and BigQuery, enabling a mix of on-premises Hadoop storage and cloud compute or analytics engines for model training and batch scoring. Apache Hadoop served as the canonical Big Data Infrastructure layer that connected raw healthcare feeds to downstream clinical analytics and R&D consumption.
Governance and operational practices centered on the Data Science Lab, with controlled data access, code review workflows using Git and Gerrit, and lifecycle handoff patterns for moving models from experimentation to production. Operational responsibilities included configuration management, job scheduling and packaging of models for deployment into 3M's clinical data analytics platform.
|
|
|
4INFO | Professional services | 100 | $28M | United States | Apache Software | Apache Hadoop | Database Management | 2010 | n/a |
In 2010, 4INFO implemented Apache Hadoop as Big Data Infrastructure to architect and build a new reporting infrastructure utilizing Hadoop and Hive, focused on delivering near real time reporting capabilities for the business. The deployment established a distributed data storage and processing foundation using Apache Hadoop as the core platform, with Apache Hive used as the SQL query layer to enable analyst-friendly reporting and ad hoc queries against large datasets.
The implementation included configuration of storage and compute tiers, a metadata and query service through Hive, and the construction of ingestion pipelines for both batch and micro‑batch data flows to support low-latency reporting. Functional capabilities emphasized scalable HDFS-based storage, a query and analytics layer via Apache Hive, and pipeline orchestration and transformation logic to normalize data for downstream reports.
Operational coverage targeted the reporting and analytics function across 4INFO, with governance workstreams for data schema management, access control and query governance to ensure consistent reporting. Apache Hadoop and Apache Hive were explicitly restated as the platform components enabling the Big Data Infrastructure, and the project outcome was to provide the business with near real time reporting capability through a centralized Hadoop-based reporting infrastructure.
|
|
|
AbbVie | Life Sciences | 55000 | $56.3B | United States | Apache Software | Apache Hadoop | Database Management | 2014 | n/a |
In 2014, AbbVie implemented Apache Hadoop as a Database Management platform to enable Supply Chain Analytics using a Hadoop echostack deployment. The implementation targeted supply chain analytics workloads, consolidating transactional and operational feeds into a centralized Hadoop data lake to support analytics for inventory, demand planning, and logistics functions within AbbVies supply chain organization.
The Apache Hadoop implementation emphasized distributed storage and large scale batch processing, leveraging core Hadoop architectural patterns such as HDFS for durable distributed storage and Hadoop processing frameworks for parallel analytics. Configuration focused on data ingestion pipelines and schema-on-read principles common to Hadoop deployments, enabling downstream analytics and reporting layers to consume normalized and raw supply chain datasets.
Operational coverage centered on supply chain, procurement, and logistics business functions, where the Hadoop echostack was used to host consolidated event and transaction data for analytics engineering. Data governance and access controls were instituted to align with life sciences data stewardship needs, ensuring controlled access to analytic artifacts and auditability for regulated operational datasets.
AbbVies Apache Hadoop deployment served as a Database Management foundation for supply chain analytics, providing a scalable repository and processing layer for large volume supply chain data and enabling standard Hadoop-oriented workflows for ETL, batch analytics, and data science experimentation.
|
|
|
Activision Blizzard | Professional Services | 13000 | $7.5B | United States | Apache Software | Apache Hadoop | Database Management | 2015 | n/a |
In 2015 Activision Blizzard implemented Apache Hadoop as a core component of its Database Management footprint to support the Data Services platform that ingests and analyzes game telemetry. The deployment was scoped to serve multiple Activision titles including the Call of Duty franchise and specific projects such as Call of Duty Black Ops 3, Infinite Warfare, Skylanders Superchargers, and Guitar Hero Live, reflecting an enterprise big data use case inside the companys Data Services team.
Apache Hadoop served as the distributed storage and processing layer within a cloud-native, scalable data platform, and the implementation included Hadoop alongside Hive for SQL-on-Hadoop and Qubole for managed big data operations. The project encompassed data warehouse design and development work, leveraging Hadoop for batch processing and Hive metadata to support analytics and reporting workflows required by game analytics teams.
The architecture integrated Apache Hadoop with a broader streaming and cloud stack, explicitly including Kafka for event streaming, Amazon Web Services for cloud infrastructure, DCOS for orchestration, and Redshift for analytic warehousing where appropriate. CI and deployment tooling were incorporated into the flow, with Jenkins and Docker used to automate builds and containerized service delivery, and Dropwizard microservices, Java, and Python used for data processing and service APIs.
Governance and operational practices were addressed through engineering mentorship and process improvements led by the Software Architect for Data Services, with emphasis on reproducible builds via Maven, microservice design patterns, and automated pipelines. The implementation framed Apache Hadoop as the Database Management backbone for game telemetry and analytics, supporting developer workflows and the Data Services operational model in the Vancouver engineering organization.
|
|
|
|
Professional Services | 3650 | $650M | United States | Apache Software | Apache Hadoop | Database Management | 2017 | n/a |
|
|
|
|
Professional services | 130 | $15M | United States | Apache Software | Apache Hadoop | Database Management | 2004 | n/a |
|
|
|
|
Professional Services | 435 | $140M | United States | Apache Software | Apache Hadoop | Database Management | 2012 | n/a |
|
|
|
|
Media | 569 | $137M | United States | Apache Software | Apache Hadoop | Database Management | 2015 | n/a |
|
|
|
|
Insurance | 3500 | $650M | United States | Apache Software | Apache Hadoop | Database Management | 2013 | n/a |
|
Buyer Intent: Companies Evaluating Apache Hadoop
- Merck, a United States based Life Sciences organization with 73000 Employees
- Bexhill College United Kingdom, a United Kingdom based Education company with 150 Employees
- Horizon Forest Products, a United States based Distribution organization with 200 Employees
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated |
|---|---|---|---|---|---|---|
| Merck | Life Sciences | 73000 | $64.2B | United States | 2026-01-12 | |
| Bexhill College United Kingdom | Education | 150 | $15M | United Kingdom | 2026-01-05 | |
| Horizon Forest Products | Distribution | 200 | $30M | United States | 2025-11-16 | |
| Healthcare | 26000 | $3.9B | United States | 2025-09-30 | ||
| Consumer Packaged Goods | 35 | $8M | United Arab Emirates | 2025-09-09 | ||
| Professional Services | 6000 | $3.2B | United States | 2025-08-13 | ||
| Manufacturing | 21000 | $8.5B | United Kingdom | 2025-05-09 | ||
| Education | 608 | $220M | United States | 2025-04-15 | ||
| Education | 4700 | $1.2B | Sweden | 2025-03-25 | ||
| Education | 1500 | $170M | Canada | 2025-03-10 |