ServiceNow Implementation stories

Introduction

In today’s fast-paced IT environments, efficient workload management is crucial for maintaining operational excellence and minimizing disruptions. IBM Workload Scheduler (IWS), also known as Tivoli Workload Scheduler, is a powerful tool for automating and managing complex batch workloads across various platforms. However, when issues arise, they can quickly escalate, leading to significant operational challenges and costs. This article presents a real-world scenario where ServiceNow implementation was used to address recurring problems with IWS, resulting in substantial time and cost savings for the business. This was implemented in 2018.

IBM Workload Scheduler Overview

IBM Workload Scheduler is an enterprise-grade workload automation solution that helps organizations schedule, execute, and manage batch workloads across diverse IT environments. It provides centralized control over job scheduling, offering features such as event-driven scheduling, workload forecasting, and real-time monitoring. Despite its robust capabilities, IWS can experience issues that lead to job failures and system disruptions.

The Challenge

Our organization faced a critical problem with multiple jobs failing on servers managed by IWS. Each incident (INC) cost approximately $8 per person to resolve, with around 11,000 INCs occurring monthly. This translated to a staggering $88,000 per month or nearly $1 million annually in incident management costs. Moreover, each INC required 2-3 days to resolve, following a labor-intensive process:

  1. Stop affected jobs
  2. Log into the application
  3. Fetch relevant logs
  4. Analyze error messages
  5. Take appropriate action

This manual approach was not only time-consuming but also prone to human error and inconsistencies in problem resolution.

The Solution: A Phased ServiceNow Implementation

To address these challenges, I developed a comprehensive solution leveraging ServiceNow’s powerful capabilities. The implementation was carried out in multiple phases, each building upon the previous to create a robust, automated system for managing IWS-related incidents.

Phase 1: Self-Service Form Development

I began by creating a self-service form in ServiceNow to manage IWS limits. This form allowed users to set IWS limits to zero or normal values using REST APIs. Key features included:

  • Default values to minimize human error during submissions
  • User-friendly interface for easy adoption
  • Integration with IWS through REST messages

I then conducted thorough training sessions to educate the support team on using this new tool effectively.

Phase 2: Automated Incident Detection and Response

In this phase, I implemented an automated system to detect and respond to incident spikes:

  • Developed a scheduled script to monitor for more than 15 INCs occurring within a 3-minute window
  • If triggered, the script automatically sets IWS limits to zero using the self-service form, effectively stopping additional INCs from being generated
  • Support team is alerted to investigate the root cause
  • Once resolved, support team restarts IWS by setting limit values back to normal through the self-service form

This approach significantly reduced the number of tickets from 11,000 to 2,000 per month, allowing the support team to focus on resolving one incident at a time more effectively.

Phase 3: Incident Automation Identification

I implemented business rules in ServiceNow to identify INCs that could be automated:

  • Created conditions to match specific incident patterns
  • Automated notifications to the support team indicating “This INC can be automated”
  • Conducted training sessions to familiarize the support team with this new feature

Phase 4: Automated Log Retrieval

To further streamline the incident resolution process, I developed an automation to:

  • Log into IWS upon incident detection
  • Retrieve relevant logs
  • Automatically attach logs to the corresponding INC in ServiceNow

This eliminated the need for manual log retrieval, saving significant time and reducing the potential for human error.

Phase 5: Error-Action Mapping and Automated Resolution

In this critical phase, I created a comprehensive table mapping common errors to their required resolution actions:

  • Developed automations to perform necessary operations on affected servers based on identified errors
  • Implemented automated email notifications to relevant teams, detailing the error message and actions taken
  • Continued to educate the support team on these new automated processes

Phase 6-8: Expansion and Refinement

These phases focused on expanding the scope of our automation:

  • Rolled out the solution to more servers with similar common errors
  • Continuously added new error messages and corresponding actions to the mapping table
  • Implemented dynamic change management (CHG) integration, allowing automations to create and manage change requests as needed

Phase 9: Metrics and Reporting

In the final phase, I developed comprehensive dashboards for leadership, providing key metrics on:

  • Automation vs. manual incident resolution
  • System maintenance efficiency
  • Scalability of the solution

Key Metrics and Results

The implementation of this ServiceNow-based solution yielded impressive results:

  • Time Saved: Reduced incident resolution time from 2-3 days to 2-3 minutes
  • Cost Savings: Approximately $600,000 to $800,000 saved annually through automated incident closure and reduced manual intervention
  • Reduced Disruption: Significantly decreased the number of support team members required for incident review, allowing resources to be allocated to more critical tasks

More Such Automations

The implementation of ServiceNow automations across various business processes demonstrates the platform’s potential to drive significant improvements in efficiency, accuracy, and user experience. By transforming legacy processes, integrating complex systems, and leveraging AI capabilities, organizations can achieve a new level of operational excellence.

The success of these implementations highlights the importance of a strategic approach to automation, focusing on reusability, security, and comprehensive integration. As organizations continue to navigate the challenges of digital transformation, platforms like ServiceNow will play an increasingly crucial role in streamlining operations and driving innovation.

Non-Disclosure Agreement Automation

One of the most impactful automations implemented was the transformation of the Non-Disclosure Agreement (NDA) process. The legacy process was characterized by:

  • Sign-off requirements from over 50 teams
  • Manual file transfers using Excel documents
  • PDF conversions and local drive storage
  • A lengthy process duration of at least one year

The ServiceNow implementation dramatically streamlined this process, reducing the time required and eliminating manual interventions. By automating the workflow, sign-offs were coordinated efficiently, and document management was centralized within the ServiceNow platform. This not only accelerated the process but also improved transparency and reduced the risk of errors associated with manual handling.

PCF Automations: Integrating ServiceNow with Hashicorp Vault

Another significant automation involved integrating ServiceNow with Hashicorp Vault for managing tokens and interacting with Pivotal Cloud Foundry (PCF). This implementation included:

  • Secure token management through Hashicorp Vault integration
  • Python-scripted PCF CLI commands for application and instance management
  • Data migration from PCF to ServiceNow for enhanced self-service capabilities
  • Scheduled workflows for initial data collection using pagination
  • Dynamic data creation through self-service request forms

This automation significantly improved the management of PCF applications and instances. By moving data to ServiceNow, the team created more intuitive self-service forms, reducing human errors in submissions. The implementation of scheduled workflows and dynamic data creation ensured that information was always up-to-date and readily available.

Hashicorp Vault Account Creation Automation

The process of creating accounts with the Hashicorp Vault team was also automated using ServiceNow. Previously, this process involved:

  • Manual outreach to application teams
  • Creation of paths and manual token retrieval

The new automation leveraged ServiceNow’s self-service forms and scheduled workflows to streamline the account creation process. This not only reduced the time required for account setup but also minimized the workload on application teams.

Overcoming Integration Challenges

Implementing these automations was not without challenges. I faced various hurdles, including:

  • IP whitelisting between servers
  • Navigating complex application documentation
  • Conducting extensive research and development
  • Proper and rigorous testing in and between different environments

To address these challenges, the team focused on building reusable components, including flows, actions, and spokes. Careful attention was paid to credential management to ensure security throughout the integration process.

Comprehensive Integration and AI Training

The project’s scope extended beyond individual process automations. Key achievements included:

  • Integration of over 50+ applications with ServiceNow
  • Creation of a single source of truth for organizational data
  • Training of AI systems for improved AIOps capabilities

This comprehensive approach to integration and data management laid the foundation for enhanced ITSM, ITOM, and ITAM processes. The focus on reusability in automations and integrations ensured that the solutions developed could be easily adapted and scaled across the organization.

Best Practices and Lessons Learned

Through this implementation, I identified several best practices for leveraging ServiceNow in IT operations:

  1. Automation Integration: Develop automations that can interact with third-party applications, create change requests as needed, and roll back to previous versions if failures occur.
  2. Self-Healing Capabilities: Implement automations to resolve minor issues without human intervention.
  3. Self-Service Emphasis: Utilize self-service forms for common requests and actions to reduce manual workload on the support team.
  4. Data-Driven Decision Making: Build dashboards based on incident data to inform strategic decisions and identify areas for further improvement.
  5. Change Management Integration: Incorporate change management processes into automations to ensure proper governance and control.
  6. Data Centralization: Move application data to ServiceNow tables to facilitate self-service and self-healing automations.

Community Development and Knowledge Sharing

To ensure long-term success and continuous improvement, I established a community development approach:

  • Implemented a code repository for sharing and version control
  • Created a comprehensive library of guidelines and documentation for utilizing reusable code snippets
  • Encouraged active participation and contributions from team members
  • Established a system for enrolling in updates to shared code resources

This approach has significantly reduced duplicate efforts among engineers and fostered a culture of collaboration and innovation.

Conclusion

By leveraging ServiceNow’s powerful capabilities and implementing a phased approach to automation, I successfully transformed our IWS incident management process. The results demonstrate the significant impact that strategic IT service management solutions can have on operational efficiency, cost reduction, and overall business performance. As organizations continue to face complex IT challenges, the lessons learned from this implementation serve as a valuable blueprint for driving continuous improvement and innovation in IT operations.