Federator.ai

Supported OS Linux

Overview

ProphetStor Federator.ai is an AI-based solution designed to enhance computing resource management for Kubernetes and Virtual Machine (VM) clusters. With its holistic observability of IT operations, including multi-tenant Large Language Model (LLM) training, resources for mission-critical applications, namespaces, nodes, and clusters can be efficiently allocated, and KPIs can be effectively achieved with minimum resource wastage.

Using advanced machine learning algorithms to predict application workloads, Federator.ai offers:

  • AI-based workload prediction for containerized applications in Kubernetes clusters, as well as VMs in VMware clusters, Amazon Web Services (AWS) Elastic Compute Cloud (EC2), Azure Virtual Machine, and Google Compute Engine
  • Resource recommendations based on workload prediction, application, Kubernetes, and other related metrics
  • Automatic provisioning of CPU/memory for generic Kubernetes application controllers/namespaces
  • Automatic scaling of Kubernetes application containers, Kafka consumer groups, and NGINX Ingress upstream services
  • Multicloud cost analysis and recommendations based on workload predictions for Kubernetes clusters and VM clusters
  • Actual cost and potential savings based on recommendations for clusters, Kubernetes applications, VMs, and Kubernetes namespaces
  • MultiTenant LLM training observability and actionable resource optimizations without performance compromise

ProphetStor Federator.ai provides full-stack observability through its APIs integrated with Datadog Agents, from application-level workloads, including LLM training, to cluster-level resource consumption. This integration fosters a dynamic loop between live monitoring and predictive analytics, continuously improving resource management, optimizing costs, and ensuring efficient application operation. You can easily track and predict the resource usages of Kubernetes containers, namespaces, and cluster nodes to make the right recommendations to prevent costly over-provisioning or performance-impacting under-provisioning. With easy integration to CI/CD pipeline, Federator.ai enables continuous optimization of containers whenever they are deployed in a Kubernetes cluster. Using application workload predictions, Federator.ai auto-scales application containers at the right time and optimizes performance with the right number of container replicas through Kubernetes HPA or Datadog Watermark Pod Autoscaling (WPA).

For additional information on Federator.ai, see the ProphetStor Federator.ai Feature Demo and ProphetStor Federator.ai for Datadog videos.

ProphetStor Federator.ai Cluster Overview

ProphetStor Federator.ai Cluster Overview

  • Cluster Resource Usage Predictions and Recommendations

    • This table shows the maximum, minimum, and average value of the CPU memory workload prediction and the recommended CPU memory resource usage from Federator.ai for cluster resource planning.
  • Cluster Node Resource Usage Predictions and Recommendations

    • This table shows the maximum, minimum, and average value of the CPU memory workload prediction and the recommended CPU memory resource usage from Federator.ai for node resource planning.
  • Node Current/Predicted Memory Usage (Daily)

    • This graph shows daily predicted memory usage from Federator.ai and the memory usage of the nodes.
  • Node Current/Predicted Memory Usage (Weekly)

    • This graph shows weekly predicted memory usage from Federator.ai and the memory usage of the nodes.
  • Node Current/Predicted Memory Usage (Monthly)

    • This graph shows monthly predicted memory usage from Federator.ai and the memory usage of the nodes.
  • Node Current/Predicted CPU Usage (Daily)

    • This graph shows daily predicted CPU usage from Federator.ai and the CPU usage of the nodes.
  • Node Current/Predicted CPU Usage (Weekly)

    • This graph shows weekly predicted CPU usage from Federator.ai and the CPU usage of the nodes.
  • Node Current/Predicted CPU Usage (Monthly)

    • This graph shows monthly predicted CPU usage from Federator.ai and the CPU usage of the nodes.

ProphetStor Federator.ai Application Overview

Application Overview Dashboard

  • Workload Prediction for Next 24 Hours

    • This table shows the maximum, minimum, and average value of the CPU memory workload prediction and the recommended CPU memory resource usage from Federator.ai for the controller resource planning in the next 24 hours.
  • Workload Prediction for Next 7 Days

    • This table shows the maximum, minimum, and average value of the CPU memory workload prediction and the recommended CPU memory resource usage from Federator.ai for the controller resource planning in the next 7 days.
  • Workload Prediction for Next 30 Days

    • This table shows the maximum, minimum, and average value of the CPU memory workload prediction and the recommended CPU memory resource usage from Federator.ai for the controller resource planning in the next 30 days.
  • Current/Predicted CPU Usage (Daily)

    • This graph shows daily predicted CPU usage from Federator.ai and the CPU usage of the controllers.
  • Current/Predicted CPU Usage (Weekly)

    • This graph shows weekly predicted CPU usage from Federator.ai and the CPU usage of the controllers.
  • Current/Predicted CPU Usage (Monthly)

    • This graph shows monthly predicted CPU usage from Federator.ai and the CPU usage of the controllers.
  • Current/Predicted Memory Usage (Daily)

    • This graph shows daily predicted memory usage from Federator.ai and the memory usage of the controllers.
  • Current/Predicted Memory Usage (Weekly)

    • This graph shows weekly predicted memory usage from Federator.ai and the memory usage of the controllers.
  • Current/Predicted Memory Usage (Monthly)

    • This graph shows monthly predicted memory usage from Federator.ai and the memory usage of the controllers.
  • Current/Desired/Recommended Replicas

    • This graph shows the recommended replicas from Federator.ai and the desired and current replicas of the controllers.
  • Memory Usage/Request/Limit vs Rec Memory Limit

    • This graph shows the recommended memory limit from Federator.ai and the requested, limited and current memory usage of the controllers.
  • CPU Usage/Request/Limit vs Rec CPU Limit

    • This graph shows the recommended CPU limit from Federator.ai and the requested, limited and current CPU usage of the controllers.
  • CPU Usage/Limit Utilization

    • This graph shows the CPU utilization of the controller and visualizes if the CPU utilization is over the limit or under the limit.

ProphetStor Federator.ai Kafka Overview

Dashboard Overview

  • Recommended Replicas vs Current/Desired Replicas

    • This timeseries graph shows the recommended replicas from Federator.ai and the desired and current replicas in the system.
  • Production vs Consumption vs Production Prediction

    • This timeseries graph shows the Kafka message production rate and consumption rate and the production rate predicted by Federator.ai.
  • Kafka Consumer Lag

    • This timeseries graph shows the sum of consumer lags from all partitions.
  • Consumer Queue Latency (msec)

    • This timeseries graph shows the average latency of a message in the message queue before it is received by a consumer.
  • Deployment Memory Usage

    • This timeseries graph shows the memory usage of consumers.
  • Deployment CPU Usage

    • This timeseries graph shows the CPU usage of consumers.

ProphetStor Federator.ai Multi-Cloud Cost Analysis Overview

Multi-Cloud Cost Analysis Overview

  • Current Cluster Cost and Current Cluster Configuration

    • These tables show the current cost and the environment configuration of the clusters.
  • Recommended Cluster - AWS and Recommended Cluster Configuration - AWS

    • These tables show the recommended AWS instances configuration from Federator.ai and the cost of the recommended AWS instances.
  • Recommended Cluster - Azure and Recommended Cluster Configuration - Azure

    • These tables show the recommended Azure instances configuration from Federator.ai and the cost of the recommended Azure instances.
  • Recommended Cluster - GCP and Recommended Cluster Configuration - GCP

    • These tables show the recommended GCP instances configuration from Federator.ai and the cost of the recommended GCP instances.
  • Namespace with Highest Cost ($/day)

    • This graph shows the highest daily cost of the namespaces in the current cluster.
  • Namespace with Highest Predicted Cost ($/month)

    • This graph shows the highest predicted monthly cost of the namespaces in the current cluster.

Setup

  • Follow the instructions below to download and set up Federator.ai.

Installation

  1. Log in to your OpenShift/Kubernetes cluster

  2. Install Federator.ai for OpenShift/Kubernetes with the following command:

    $ curl https://raw.githubusercontent.com/containers-ai/prophetstor/master/deploy/federatorai-launcher.sh | bash
    
    $ curl https://raw.githubusercontent.com/containers-ai/prophetstor/master/deploy/federatorai-launcher.sh | bash
    ...
    Please enter Federator.ai version tag [default: latest]:latest
    Please enter the path of Federator.ai directory [default: /opt]:
    
    Downloading v4.5.1-b1562 tgz file ...
    Done
    Do you want to use a private repository URL? [default: n]:
    Do you want to launch Federator.ai installation script? [default: y]:
    
    Executing install.sh ...
    Checking environment version...
    ...Passed
    Enter the namespace you want to install Federator.ai [default: federatorai]:
    .........
    Downloading Federator.ai alamedascaler sample files ...
    Done
    ========================================
    Which storage type you would like to use? ephemeral or persistent?
    [default: persistent]:
    Specify log storage size [e.g., 2 for 2GB, default: 2]:
    Specify AI engine storage size [e.g., 10 for 10GB, default: 10]:
    Specify InfluxDB storage size [e.g., 100 for 100GB, default: 100]:
    Specify storage class name: managed-nfs-storage
    Do you want to expose dashboard and REST API services for external access? [default: y]:
    
    ----------------------------------------
    install_namespace = federatorai
    storage_type = persistent
    log storage size = 2 GB
    AI engine storage size = 10 GB
    InfluxDB storage size = 100 GB
    storage class name = managed-nfs-storage
    expose service = y
    ----------------------------------------
    Is the above information correct [default: y]:
    Processing...
    
    (snipped)
    .........
    All federatorai pods are ready.
    
    ========================================
    You can now access GUI through https://<YOUR IP>:31012
    Default login credential is admin/admin
    
    Also, you can start to apply alamedascaler CR for the target you would like to monitor.
    Review administration guide for further details. 
    ========================================
    ========================================
    You can now access Federatorai REST API through https://<YOUR IP>:31011
    The default login credential is admin/admin
    The REST API online document can be found in https://<YOUR IP>:31011/apis/v1/swagger/index.html
    ========================================
    
    Install Federator.ai v4.5.1-b1562 successfully
    
    Downloaded YAML files are located under /opt/federatorai/installation
    
    Downloaded files are located under /opt/federatorai/repo/v4.5.1-b1562
    
  3. Verify Federator.ai pods are running properly.

    $ kubectl get pod -n federatorai
    
  4. Log in to Federator.ai GUI, URL and login credential could be found in the output of Step 2.

Configuration

  1. Log in to Datadog with your account and get an API key and application key for using the Datadog API.

  2. Configure Federator.ai for the metrics data source per cluster.

    • Launch Federator.ai GUI->Configuration->Clusters->Click “Add Cluster”
    • Enter API key and application key

    Add Cluster Window

  3. See the Federator.ai - Installation and Configuration Guide and User Guide for more details.

Data Collected

Metrics

federatorai.integration.status
(gauge)
integration status for showing Federator.ai health status.
federatorai.recommendation
(gauge)
recommended deployment/statefulset replicas.
federatorai.prediction.kafka
(gauge)
Workload prediction for Kafka metrics.
federatorai.kafka.broker_offset_rate
(gauge)
The delta of kafka.broker_offset timeseries in one minute.
federatorai.kafka.consumer_offset_rate
(gauge)
The delta of kafka.consumer_offset timeseries in one minute.
federatorai.prediction.node
(gauge)
Workload prediction for a Kubernetes node.
federatorai.prediction.node.avg
(gauge)
The average value of workload predictions for a Kubernetes node over a prediction window.
federatorai.prediction.node.min
(gauge)
The minimum value of workload predictions for a Kubernetes node over a prediction window.
federatorai.prediction.node.max
(gauge)
The maximum value of workload predictions for a Kubernetes node over a prediction window.
federatorai.prediction.controller
(gauge)
Workload prediction for a specific controller
federatorai.prediction.controller.avg
(gauge)
The average value of workload predictions for a specific controller over a prediction window.
federatorai.prediction.controller.min
(gauge)
The minimum value of workload predictions for a specific controller over a prediction window.
federatorai.prediction.controller.max
(gauge)
The maximum value of workload predictions for a specific controller over a prediction window.
federatorai.prediction.nginx_ingress_controller_request_rate
(gauge)
Workload prediction of request rate for the upstream service of Nginx ingress
federatorai.resource_planning.node
(gauge)
Workload predictions for resource planning of a Kubernetes node.
federatorai.resource_planning.controller
(gauge)
Workload predictions for resource planning of a Kubernetes controller.
federatorai.recommendation.instance
(gauge)
Cost of a recommended cloud instance.
federatorai.cost_analysis.instance.cost
(gauge)
Cost analysis for a cloud instance.
federatorai.cost_analysis.namespace.cost
(gauge)
Cost analysis for a namespace in a Kubernetes cluster
federatorai.prediction.namespace.cost
(gauge)
Cost prediction for a namespace in a Kubernetes cluster
federatorai.kubernetes.cpu.usage.total.controller
(gauge)
The number of cores (in millicore) used by the Kubernetes controller.
federatorai.kubernetes.memory.usage.controller
(gauge)
The memory usage (in bytes) of the Kubernetes controller.
federatorai.kubernetes.cpu.usage.total.node
(gauge)
The number of cores (in millicore) used by the Kubernetes node.
federatorai.kubernetes.memory.usage.node
(gauge)
The memory usage (in bytes) of the Kubernetes node.
federatorai.cost_analysis.resource_alloc_cost.cluster
(gauge)
The cost per hour/per 6 hours/per day based on resource allocation of a Kubernetes cluster for daily/weekly/monthly cost analysis
federatorai.cost_analysis.resource_alloc_cost.node
(gauge)
The cost per hour/per 6 hours/ per day based on resource allocation of a Kubernetes node for daily/weekly/monthly cost analysis
federatorai.cost_analysis.resource_alloc_cost.namespace
(gauge)
The cost per hour/per 6 hours/per day based on resource allocation of a Kubernetes namespace for daily/weekly/monthly cost analysis
federatorai.cost_analysis.resource_usage_cost.cluster
(gauge)
The cost per hour/per 6 hours/per day based on resource usage of a Kubernetes cluster for daily/weekly/monthly cost analysis
federatorai.cost_analysis.resource_usage_cost.node
(gauge)
The cost per hour/per 6 hours/per day based on resource usage of a Kubernetes node for daily/weekly/monthly cost analysis
federatorai.cost_analysis.resource_usage_cost.namespace
(gauge)
The cost per hour/per 6 hours/per day based on resource usage of a Kubernetes namespace for daily/weekly/monthly cost analysis
federatorai.cost_analysis.cost_per_day.cluster
(gauge)
The cost of the entire 24 hours based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_per_day.node
(gauge)
The cost of the entire 24 hours based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_per_day.namespace
(gauge)
The cost of the entire 24 hours based on resource allocation of a Kubernetes namespace
federatorai.cost_analysis.cost_per_week.cluster
(gauge)
The cost of the entire 7 days based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_per_week.node
(gauge)
The cost of the entire 7 days based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_per_week.namespace
(gauge)
The cost of the entire 7 days based on resource allocation of a Kubernetes namespace
federatorai.cost_analysis.cost_per_month.cluster
(gauge)
The cost of the entire 30 days based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_per_month.node
(gauge)
The cost of the entire 30 days based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_per_month.namespace
(gauge)
The cost of the entire 30 days based on resource allocation of a Kubernetes namespace
federatorai.cost_analysis.cost_efficiency_per_day.cluster
(gauge)
The cost efficiency for the entire 24 hours based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_efficiency_per_day.node
(gauge)
The cost efficiency for the entire 24 hours based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_efficiency_per_day.namespace
(gauge)
The cost efficiency for the entire 24 hours based on resource allocation of a Kubernetes namespace
federatorai.cost_analysis.cost_efficiency_per_week.cluster
(gauge)
The cost efficiency for the entire 7 days based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_efficiency_per_week.node
(gauge)
The cost efficiency for the entire 7 days based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_efficiency_per_week.namespace
(gauge)
The cost efficiency for the entire 7 days based on resource allocation of a Kubernetes namespace
federatorai.cost_analysis.cost_efficiency_per_month.cluster
(gauge)
The cost efficiency for the entire 30 days based on resource allocation of a Kubernetes cluster
federatorai.cost_analysis.cost_efficiency_per_month.node
(gauge)
The cost efficiency for the entire 30 days based on resource allocation of a Kubernetes node
federatorai.cost_analysis.cost_efficiency_per_month.namespace
(gauge)
The cost efficiency for the entire 30 days based on resource allocation of a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_per_day.cluster
(gauge)
The estimated cost of the entire 24 hours based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_per_day.node
(gauge)
The estimated cost of the entire 24 hours based on Federator.ai recommendation for a Kubernetes node
federatorai.recommendation.cost_analysis.cost_per_day.namespace
(gauge)
The estimated cost of the entire 24 hours based on Federator.ai recommendation for a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_per_week.cluster
(gauge)
The estimated cost of the entire 7 days based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_per_week.node
(gauge)
The estimated cost of the entire 7 days based on Federator.ai recommendation for a Kubernetes node
federatorai.recommendation.cost_analysis.cost_per_week.namespace
(gauge)
The estimated cost of the entire 7 days based on Federator.ai recommendation for a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_per_month.cluster
(gauge)
The estimated cost of the entire 30 days based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_per_month.node
(gauge)
The estimated cost of the entire 30 days based on Federator.ai recommendation for a Kubernetes node
federatorai.recommendation.cost_analysis.cost_per_month.namespace
(gauge)
The estimated cost of the entire 30 days based on Federator.ai recommendation for a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_efficiency_per_day.cluster
(gauge)
The cost efficiency for the entire 24 hours based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_efficiency_per_day.namespace
(gauge)
The cost efficiency for the entire 24 hours based on Federator.ai recommendation for a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_efficiency_per_week.cluster
(gauge)
The cost efficiency for the entire 7 days based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_efficiency_per_week.namespace
(gauge)
The cost efficiency for the entire 7 days based on Federator.ai recommendation for a Kubernetes namespace
federatorai.recommendation.cost_analysis.cost_efficiency_per_month.cluster
(gauge)
The cost efficiency for the entire 30 days based on Federator.ai recommendation for a Kubernetes cluster
federatorai.recommendation.cost_analysis.cost_efficiency_per_month.namespace
(gauge)
The cost efficiency for the entire 30 days based on Federator.ai recommendation for a Kubernetes namespace

Service Checks

Federator.ai does not include any service checks.

Events

Federator.ai does not include any events.

Troubleshooting

Need help? Read the Federator.ai - Installation and Configuration Guide or contact Datadog support.

PREVIEWING: esther/docs-8632-slo-blog-links