Hawk-eye

 

Fabriscale Hawk-eye is an InfiniBand monitoring real-time analytics platform that provides visual insight into the status of your InfiniBand cluster.

  • Hawk-eye gives you, in Real-Time, an overview of system performance, it helps you to visualize your topology, and it lets you to drill-down into statistics, alerts and key metrics of your HPC architecture.
  • The monitoring of your InfiniBand network is automated by using Hawk-eye, and the system raise alarms (link failures, port error rates, congestion notification etc.) only when the operator’s attention is required.
  • Hawk-eye is enabling a 360 view of your HPC infrastructure, and keep you updated on the status of your installation 24/7.
  • Hawk-eye supports seamless integration with Slurm, Torque, and other workload managers in order to leverage job scheduling information to visualise jobs in the cluster, identify potential job specific network bottlenecks and conduct job management.

Hawk-eye saves the operator time, leads to faster error recovery situations, less strain on key operator resources and finally reduced downtime of your cluster.

The Hawk-eye dashboard gives the operator a quick overview of the state and performance of the cluster, and is an entry point to dive into alerts and statistics when required.
With Hawk-eye alerts you get notified when critical events occur and you can create your own alerts with custom thresholds.
With Hawk-eye it is easy to get an overview of your topology. You can easily search for devices and you can visualise link failure, link load, server status etc.