Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

DelSubn: Delete subnet

DelNet: Delete network



Image Added

There are also some heat templates used. But this is pretty much what happens for most things.

...

      1. Orchestration
        1. Ansible – create rally, shaker, os-faults and metrics gathering plugins
      2. Test Manager
        1. Resiliency Studio/Jenkins (AT&T Proposed – time to open source is Aug – Sept 2018)
        2. Start orchestrator runs
        3. Collect metrics
        4. Incorporate capability for SLA plugins
          1. SLA plugins will decide whether test is a success/failure
        5. Interact with GitHub/CI-CD environments
          1. Provide detailed logs and metrics for data
          2. Create bugs
    1. Developer Tools
      1. Goal:
        1. Push what can be possible as far left into the development cycle
        2. Minimize resource utilization (financial & computational)
      2. Data Center Emulator
        1. Emulate reference architectures
        2. Run performance and failure injection scenarios
        3. Mathematically extrapolate for acceptable limits
  1. Define test scope & scenarios
    1. KPI/SLA
      1. What metrics are part of the KPI matrix
        1. Examples: Control Plane – API response time, Success Rate, RabbitMQ connection distribution, CPU/Memory utilization, I/O rate, etc.
        2. Examples: Data Plane – throughput, vrouter failure characteristics, storage failure characteristics, memory congestion, scheduling latency, etc.
      2. What are the various bounds?
        1. Examples: Control Plane - RabbitMQ connection distribution should be uniform within a certain std. deviation, API response times are lognormal distributed and not acceptable past 90 percentile, etc.
      3. Realistic Workload Models for control & data plane
      4. Realistic KPI models from operators
      5. Realistic outage scenarios
    2. Automated Test Case Generation
      1. Are there design & deployment templates that can be supplied so that an initial suite of scenarios is automatically generated?
      2. Top-Down assessment methodology to generate the scenarios – shouldn't burden developers with "paperwork".
    3. Performance
      1. Control Plane
      2. Data Plane
    4. Destructive
      1. Failure injection
    5. Scale
      1. Scale resources (cinder volumes, subnets, VMs, etc.)
    6. Concurrency
      1. Multiple requests on the same resource
    7. Operational Readiness
      1. What are we looking for here – just a shakeout to ensure a site is ready for operation? May be a subset of performance & resiliency tests?
  2. Define reference architectures
    1. What are the reference architectures?
    2. H/W variety – where is it located?
    3. Deployment toolset for creating repeatable experiments – there is ENoS for container based deployments, what about other types?
    4. Deployment, Monitoring & Alerting templates
  3. Implementation Priorities
    1. Tackle Control Plane & Software Faults (rally + os-faults)
      1. Most code already there – need more plugins
      2. os-faults: More fault injection scenarios (degradations, core dumps, etc.)
      3. Rally: Randomized triggers, SLA extensions (e.g. t-test with p-values)
      4. Metrics gathering plugin
    2. Shaker enhancements (rally + shaker + os-faults)
      1. os-faults hooks mechanisms
      2. Storage, CPU/Memory (also cases with sr-iov, dpdk, etc.)
      3. os-faults for data plane software failures (cinder driver, vrouter, kernel stalls, etc.)
      4. Develop SLA measurement hooks
    3. os-faults underlay enhancements & data center emulator
      1. os-faults: Underlay crash & degradation code
      2. Data center emulator with ENoS to model underlay & software
    4. Traffic models & KPI measurement
      1. Realistic traffic models (CP + DP) and software to emulate the models
      2. Real KPI and scaled KPI to measure in virtual environments

...