List of Apache Spark MLlib Customers
Wilmington, 19801, DE,
United States
Since 2010, our global team of researchers has been studying Apache Spark MLlib customers around the world, aggregating massive amounts of data points that form the basis of our forecast assumptions and perhaps the rise and fall of certain vendors and their products on a quarterly basis.
Each quarter our research team identifies companies that have purchased Apache Spark MLlib for ML and Data Science Platforms from public (Press Releases, Customer References, Testimonials, Case Studies and Success Stories) and proprietary sources, including the customer size, industry, location, implementation status, partner involvement, LOB Key Stakeholders and related IT decision-makers contact details.
Companies using Apache Spark MLlib for ML and Data Science Platforms include: Salesforce, a United States based Professional Services organisation with 76453 employees and revenues of $37.90 billion, CrowdStrike, a United States based Professional Services organisation with 10363 employees and revenues of $3.95 billion, Yelp, a United States based Professional Services organisation with 5116 employees and revenues of $1.41 billion, FINRA, a United States based Professional Services organisation with 3600 employees and revenues of $1.11 billion, GumGum, a United States based Professional Services organisation with 480 employees and revenues of $113.0 million and many others.
Contact us if you need a completed and verified list of companies using Apache Spark MLlib, including the breakdown by industry (21 Verticals), Geography (Region, Country, State, City), Company Size (Revenue, Employees, Asset) and related IT Decision Makers, Key Stakeholders, business and technology executives responsible for the software purchases.
The Apache Spark MLlib customer wins are being incorporated in our Enterprise Applications Buyer Insight and Technographics Customer Database which has over 100 data fields that detail company usage of software systems and their digital transformation initiatives. Apps Run The World wants to become your No. 1 technographic data source!
Apply Filters For Customers
| Logo | Customer | Industry | Empl. | Revenue | Country | Vendor | Application | Category | When | SI | Insight |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
CrowdStrike | Professional Services | 10363 | $4.0B | United States | Apache Software | Apache Spark MLlib | ML and Data Science Platforms | 2016 | n/a |
In 2016, CrowdStrike implemented Apache Spark MLlib to perform large-scale feature extraction and to drive machine learning classification of event data ingested from Falcon Host, its software-as-a-service endpoint protection solution, under the Apps Category . The deployment focused on embedding Apache Spark MLlib into the data processing pipeline used by the security research and engineering teams to support behavioral analysis and model training workflows.
The implementation concentrated on Spark-based feature engineering and model scoring capabilities, using Apache Spark MLlib for scalable distributed machine learning workloads. CrowdStrike configured Spark jobs for batch feature extraction and iterative model development, and instrumented job lifecycle controls to align compute usage with the engineering team’s need for agility.
Operational integration included coupling Apache Spark MLlib with CrowdStrike’s Apache Cassandra backed Threat Graph and running the analytics stack on AWS infrastructure to reduce operational overhead. The architecture emphasized ephemeral instance control for Cassandra, the ability to start and stop nodes for environment rebuilds, and scalable compute provisioning for Spark to address rapidly growing event volumes.
Governance and operational requirements centered on high availability, scalability, and cost-effective storage for petabyte-scale Cassandra data. Rollout priorities included maintaining uptime for Falcon Host ingestion pipelines, enabling reproducible environment rebuilds for engineering, and ensuring Spark MLlib workflows could scale without increasing on-premises operational burden.
|
|
|
FINRA | Professional Services | 3600 | $1.1B | United States | Apache Software | Apache Spark MLlib | ML and Data Science Platforms | 2019 | n/a |
In 2019, FINRA deployed Apache Spark MLlib on Amazon EMR to move from SQL batch processes on-prem to cloud native distributed analytics for billions of time-ordered market events. The work was implemented within the ML and Data Science Platforms category to provide scalable machine learning infrastructure for surveillance and analytics use cases.
Configuration emphasized Apache Spark MLlib based model training and machine learning pipeline orchestration, enabling feature engineering, iterative model development, and large scale distributed computation. Workloads were restructured from nightly SQL batch jobs to continuous Spark workflows to support faster training cycles and backtesting on historic market downturn datasets.
Operationally the deployment used Amazon EMR for compute elasticity to process high velocity market event streams and historical order tapes at scale. These compute and ML workflows were consumed by data science teams supporting market surveillance, risk analytics, investor protection, and market integrity functions.
Governance shifted from batch release cycles to pipeline and model governance with standardized validation and backtesting workflows to ensure model integrity for surveillance and compliance. FINRA can now test models on realistic data from market downturns, enhancing its ability to provide investor protection and promote market integrity.
|
|
|
GumGum | Professional Services | 480 | $113M | United States | Apache Software | Apache Spark MLlib | ML and Data Science Platforms | 2017 | n/a |
In 2017, GumGum implemented Apache Spark MLlib to operationalize machine learning across its advertising analytics stack and to handle extremely high event volumes. The implementation targeted a platform that ingests more than 1 billion events per day, approximately 6 TB of data daily, and was selected to support continuous processing and model-driven inventory forecasting, addressing the company need to expedite customer decision making and scale quickly.
Apache Spark MLlib was deployed on Amazon EMR as the primary machine learning runtime, with configurations for model training, batch scoring, and feature engineering pipelines. The deployment uses Apache Spark MLlib for inventory forecasting workflows and integrates standard Spark MLlib capabilities for model fitting, transformation pipelines, and distributed feature processing to support programmatic and native advertising analytics.
The architecture places ad servers at the event edge, writing event logs that are uploaded to Amazon Simple Storage Service S3 on an hourly cadence. Amazon Data Pipeline orchestrates production, testing, and development workflows, Amazon EMR runs Apache Spark MLlib workloads alongside Hadoop for hourly data processing, and processed outputs are persisted into Amazon Redshift for downstream analytics and reporting. Operational coverage includes production, testing, and development environments and impacts ad operations and analytics functions responsible for campaign forecasting and reporting.
Governance and operationalization relied on pipeline-driven environment segregation and hourly ingestion patterns to remove processing bottlenecks and maintain continuous processing requirements. The implementation of Apache Spark MLlib at GumGum is positioned as a scalable, EMR-hosted machine learning layer within the larger AWS-based data pipeline, designed to support programmatic advertising, image recognition derived signals, and customer-facing analytics.
|
|
|
|
Professional Services | 76453 | $37.9B | United States | Apache Software | Apache Spark MLlib | ML and Data Science Platforms | 2020 | n/a |
|
|
|
|
Professional Services | 5116 | $1.4B | United States | Apache Software | Apache Spark MLlib | ML and Data Science Platforms | 2018 | n/a |
|
Buyer Intent: Companies Evaluating Apache Spark MLlib
Discover Software Buyers actively Evaluating Enterprise Applications
| Logo | Company | Industry | Employees | Revenue | Country | Evaluated | ||
|---|---|---|---|---|---|---|---|---|
| No data found | ||||||||