success stories

NOAA NWS Office of Dissemination 24×7 Cloud Support Contract

Background:

The scope of this contract encompasses comprehensive operations, maintenance, and monitoring support, classified as Tier 2, for the ODIS public cloud-hosted applications. This involves providing IT security support to uphold the integrity of cloud system security plans, identify vulnerabilities, formulate mitigation strategies, and create new security processes. The cloud environment is established within Amazon Web Services (AWS), and it encompasses five primary applications: Damage Assessment Toolkit (DAT), NWS Chat, National GIS Map Viewer (“The Viewer”), Hydrologic Visualization and Information Services (HydroVIS), and Cloud GIS Web Services (CloudGIS). The Tier 2 service desk assumes a pivotal role in meeting the cloud-related requirements of these applications and acting as a direct point of contact for all clients in the event of incidents. The support structure is operational 24/7, 365 days a year. It includes operations, monitoring, incident response, troubleshooting, diagnosis, and issue resolution pertaining to the cloud infrastructure on AWS, facilitated through monitoring and administrative tools.

At the contract’s inception, engagements with stakeholders were initiated to acquire a holistic comprehension of the infrastructure through high-level application overviews and frequent discussions for each supported service. Notably, focused efforts were invested in understanding the NWSChat application, encompassing the development of an intricate architecture diagram outlining on-prem connectivity and resource deployment across multiple regions. Accompanying this, proactive measures were taken to secure N-Wave resources for enhanced NWSChat development and production support. An array of technical tasks were accomplished, including the provisioning of VPC endpoints, configuring private subnet internet access, implementing CloudWatch monitoring dashboards, and ensuring comprehensive security, compliance, and operational functionalities. A particularly robust communication framework, including SMS and phone notifications, was established using AWS Systems/Incident Manager. Furthermore, the deployment of NWS Chat code on Amazon ElastiCache for Redis, Amazon MQ, and Amazon Aurora PostgreSQL services was meticulously orchestrated across two AWS regions.

For the CloudGIS application, comprehensive support was extended in the domains of deployment, logging, and monitoring solutions. Collaborations with AWS ProServe and NWS Viewer teams facilitated the documentation of prerequisites for deploying AWS Transit Gateway in the nws-diss-gis-dev environment. Subsequently, foundational CloudWatch monitoring for the GIS Viewer application was rolled out, followed by the formulation of a draft plan for the implementation of AWS WAF. The latter was completed and slated for review with the CloudGIS application team. Performance and load simulation testing were seamlessly integrated to complement these enhancements.

The foundation of the Tier II 24/7 cloud support team was fortified by the creation of a Standard Operating Procedure (SOP). This SOP delineated roles and responsibilities, escalation protocols, and on-call points of contact aligned with the stipulations of the PWS section. The SOP was developed with the engagement of all stakeholders and was shepherded to completion by the GAMA-1 management team. The incident management tool of choice was ServiceNow, supplemented by Redmine for issue and project tracking within Tier 2 and Tier 3 support teams. A noteworthy development post-launch was establishing a Google ring group, providing a consolidated point of contact for Tier 2 engineers and leads, and optimizing communication efficacy and response rates.

In addition to delivering solutions and AWS support to the NWS Office of Dissemination, an emphasis was placed on innovation in data ingest and integration, benefiting both NWS and NOAA. Collaboration with ServiceNow support facilitated programming communication between AWS CloudWatch and ServiceNow incident management, with the aim of automating incident ticket generation based on CloudWatch alerts. This endeavor entailed the implementation of AWS Service Management Connector for ServiceNow, followed by configuring custom templates for incident auto-population.

The security aspect of the contract involved a robust presence in system tag-up meetings for applications, including GIS Viewer, DAT, NWS Chat, and HydroVis. The IT security specialist conducted a comprehensive review of A&A documentation, including parent system documents and design/requirements documents. Additionally, engagement in CCB meetings and collaboration with NWS Security staff further contributed to a thorough understanding of NWS Chat’s onboarding requirements and folder structure. Technical Deep Dive sessions were executed with each application team, yielding an intricate comprehension of application nuances and backend instances. The security support extended to firewall configuration, POAM closure, development of enterprise documentation, and A&A findings review. Risk tracking, FIPS200 coverage, and partial TICAP implementation also were key endeavors within the security domain.

Solution & Outcomes:

The solution presented in this contract encompasses a multifaceted approach to support the ODIS public cloud-hosted applications, providing a comprehensive array of services to ensure the cloud environment’s stability, security, and seamless operation. Key elements of the solution include:

  1. Tier 2 Cloud Support Services: The Tier 2 service desk is established to provide round-the-clock support for the ODIS cloud-hosted applications. This involves monitoring, incident response, troubleshooting, and issue resolution, ensuring the continuity of operations across the AWS-hosted infrastructure.
  2. AWS Application Support and Enhancement: Specific applications like NWSChat and CloudGIS receive targeted support and enhancement efforts. For NWSChat, proactive measures are taken to enhance architecture, secure resources, and optimize performance. In the case of CloudGIS, deployment, logging, monitoring, and security measures are fine-tuned to ensure optimal performance.
  3. Standard Operating Procedure (SOP): Creating a comprehensive SOP establishes clear roles, responsibilities, and escalation procedures for the Tier 2 support team. This document is a reference guide for effective incident management, optimizing communication, and response times.
  4. Innovation and Automation: The solution actively pursues innovation by automating incident management processes through AWS CloudWatch and ServiceNow integration. This automation aims to improve incident response times and streamline incident tracking and resolution.
  5. Security Enhancement: The security solution involves thorough stakeholder engagement, deep dives into application architecture, and collaboration with NWS Security staff. It encompasses firewall configuration, compliance adherence, risk tracking, and the development of comprehensive security documentation.
  6. Stakeholder Engagement: Regular meetings with stakeholders, including high-level application overviews and tag-up sessions, ensure alignment and effective communication throughout the contract period.
  7. Operational Excellence: The solution emphasizes operational excellence by establishing a dedicated support structure, thorough documentation, proactive architecture enhancement, and constant monitoring to identify and mitigate potential issues.
  8. Continuous Improvement: The solution’s focus on innovation, automation, and security enhancement ensures continuous improvement over time. Feedback loops from stakeholders, A&A findings review, and engagement with NWS and NOAA contribute to refining processes and enhancing overall performance.

The solution aims to provide a comprehensive, efficient, and reliable support structure for the ODIS public cloud-hosted applications by encompassing these elements. It ensures seamless operations, robust security, and continuous improvement, thereby contributing to the overall success of the cloud environment on AWS.

CLIENT
NOAA National Weather Service (NWS) Office of Dissemination
SERVICES
  • IT Operations and Maintenance – Tier 2 Cloud Support Services
  • Cloud Infrasturcture & AWS Application Support and Enhancement
  • Standard Operating Procedure (SOP) Development & Maintenance
  • Digital Transformation Innovation and Automation
  • CyberSecurity Management¬†
  • Stakeholder Engagement and Collaboration
  • Continuous Improvement
YEAR(S)
2022- Present