Home Instrumentation How to effectively test PVI Express storage peripherals in the real world

How to effectively test PVI Express storage peripherals in the real world

An important part of storage device development is the interoperability testing stage. The overall objective is to ensure that the system under test (SUT) meets the quality and design specifications through a full scope test. Selecting the appropriate test tools and pitting them against system test specifications can sometimes lead to incomplete coverage of the tests to be performed if the limitations of the test tools are not well understood. This especially applies to recent PCI Express (PCIe) host interfaces of storage devices. Data communications have made a huge leap with this technology. In an effort to increase performance, the protocol structures are more advanced. Unlike previous generations of storage peripherals, these new structures use data protection encryption mechanisms to ensure transmission reliability. For example, storage command structures for protocols such as Non-Volatile Memory(NVM)Express, SATA Express, and SCSI Express are now encoded and contained in the PCI Express payload. It's really this change that makes testing more complicated. The test tools of the past no longer serve to get the job done.

Over the years many test technologies have been used by designers and interoperability test engineers to determine how their new equipment will perform once released into the real world. Variations in testing techniques and selection of tools for each methodology were based on the uniqueness of the protocol attributes and the reliability of the test in performing the required and/or desired task. A good example of this would be the use by storage engineers and technicians of "in-line error injectors" to identify various real-time issues associated with the protocol such as Serial Attached SCSI (SAS).

"Online Error Injectors" or "Jammers" as they are known, are equipment that manipulates or impairs protocol traffic in a connection between two nodes or in a link. 

Storage devices Hosts and Devices (also called Initiators and Targets) are a good example. Jammers deliberately create error conditions in a very well-controlled manner by designers to test error detection and recovery routines on new equipment. It is important to prevent additional changes of situation or modification of the transmission characteristics during the insertion of the error, otherwise the test results will be uncertain.

A very common test plan is to make a list of tests describing how to create important error conditions and the criteria that will be used to determine if the responses are valid or not to the error stimulus. These scenarios are implemented on the test computer and run on the system under test (SUT). Simple tests can range from inserting/deleting traces to replacing a "good" status response with an "error" status response. All this is done with the aim of determining when the system recognizes the error and acts correctly to the created condition. 

Through a careful system of tests, the error recovery mechanisms have to be continually refined until the device offers reliable operation under normal and stress operating conditions. The overall objective is to ensure that the SUT meets design and quality specifications through comprehensive test coverage.

Jammer Testing Capabilities 

The Jammer is normally connected by directly wiring between the Initiator and the Target. Ideal Jammers are electrically invisible and non-intrusive in the SUT. Depending on the protocol's traffic control plan and introducing specific error conditions, the system's responses to various communication downtime scenarios can be tested. 

The real benefit of the jammer is that it can perform error testing including the following functions:

 

1. Inject errors into a real world situation in-line:.

to. Bit error injection

b. CRC modification

c. Frame modification

d. Primitive modification

and. Link connect or disconnect

F. Out of Band(OOB) and Speed ​​Negotiation Window(SNW) modification

2. Verify if the SUT recovers from error conditions without data loss or corruption.

 

This simple test methodology makes this tool indispensable in the protocol communication used in storage systems. Parallel ATA, SCSI, Fiber Channel, and SAS/SATA test environments require these types of tools to determine the performance of the device in the system and how well it responds when operating under less than ideal bus conditions.

The reason jammers were so effective in traditional storage applications was that they allowed for a simpler method of testing. Previously, in the computer industry, storage communications between an initiator host and target devices were less complicated, 

When a script was sent from the host, the target returned a status code byte indicating that the command was successful, an error was encountered, or the target was busy serving other requests. 

Then it was passed to the next command in the sequence or retransmitted until the message finally succeeded. Rigid ribbon cables were used to transmit data communications at low speeds, and the lengths were restricted by crosstalk noise from the cables themselves. It was around this time that the jammer became a standard tool in storage testing applications. The process of setting up a jam scenario was not complicated with the protocol's few safeguard mechanisms to interfere by modifying online traffic.

Testing Storage Protocols in the Past.

 

As data communication evolved to serial protocol transmissions, things changed from a data integrity standpoint. Storage eventually adopted “peer-to-peer” architectures, moving away from shared channels in an attempt to increase speeds and reduce driver complexity and cost. Differential signals, 8b/10b encodings, and alignment primitives were used to reduce noise issues in the physical layer. 

In the Link Layer Trace Information Structure (FISs), packets containing payload information or data mechanisms were created offering a higher level of link control than was present in previous Parallel ATA control systems. Once the data is processed at the link layer, the transport layer processes the FIS information and sends the data load (payload) to the command layer for execution. These data structures become the focus of data plans. verification tests. 

Jammers were ideal for testing these data structures and were widely used by the storage industry. As the interfaces of these protocols have increased in complexity and speed, many of the jammers' functions have had to be revised or their analysis capabilities have been reduced reducing their overall capability and effectiveness. In recent years, the storage industry has adopted PCI Express-based SSd interfaces such as NVMe, SATA Express, and SCSI Express to meet the needs of continually increasing performance by reducing protocol latencies. The new protocols use encrypted data protection mechanisms that ensure reliable data transmission as required by these high-performance interfaces. 

PCI Express uses the following data protection mechanisms on top of existing physical layer features such as differential signaling, encrypted data transmission, and wide clock spectrum. Test methodologies that have been developed around PCI Express (PCIe) include protocol analyzers and protocol generators to generate controlled test stimuli for both host and device systems. 

The generator is flexible enough to provide low-level emulation of host and device PCIe. Using the generator you can perform stress tests and complex cases controlling protocol traffic including speed changes, track width, flow control, TLP and DDP transactions. Additionally, various error conditions can be created through simple scripting techniques.

 Logically, there has been continued interest from the storage validation community in using previous methodology and equipment to validate PCIe storage systems. Driven by this desire, initial efforts were made to define and develop a PCIe jammer. Due to the limitations imposed by the PCIe protocol such as protocol delivery mechanisms and command/control encapsulation structures, the result of these efforts was a team extremely lacking in its overall testability and effectiveness in testing PCIe systems. PCIe storage, offering only a small subset of the original features that I had to realize. Consequently many designers and interop test engineers discontinued the use of jammers and sought more effective implementations. 

Accordingly, many designers and interoperability test engineers have discontinued use of the jammer and have moved to more effective test tool implementations. These new methodologies are now widely spread in the industry of servers, workstations, add-in cards and embedded systems. It is clear that the PCI Express protocol has in itself sufficient mechanisms to provide reliable point-to-point data transmissions. Storage data and command structures for protocols such as NVMe, SATA Express, and SCSI Express are now encoded and contained in the PCIe data payload. These payloads are now highly protected in PCIe packets (or frames) and are delivered to their final destination without error. 

Due to these data protection features of the PCI Express protocol, jamming data structures are not useful for testing hosts or devices. The list below includes some of the error injector limitations that developers had implemented in PCIe jammers.

 

Barriers using PCIe Jammers in Storage Applications

 

There are two major hurdles a jammer must overcome to be a valid PCIe testing tool.

The first is to inject without interrupting the delivery mechanism of the protocol. If a PCIe jammer is placed between a device and a host, all PHY, link, and transaction packets have to go through the jammer to be processed. It takes the jammer time to decode and process each packet individually to find the specific structure to corrupt or modify. This delay affects the tight ACK latency requirements of the PCIe protocol data protection mechanisms. 

If the jammer adds a lot of latency, each TLP packet will be resent multiple times as replay as the ACK return response time expires. The constant repetition of TLPs degrades link performance, destabilizes normal traffic transmission, and introduces system behavior changes outside the scope of controlled error insertion.

The second hurdle to overcome is jamming commands and control structures. In earlier protocols the protocol header contained command and control information. During transmission, this header (which is separate from the data payload) was read and then possibly manipulated by the jammer during the execution of the preprogrammed error environment. PCIe storage protocols behave very differently with this command and control information, as it is now contained in the payload (not in the header) of the Transaction Layer Packet (TLP). In fact, all commands, data, and queue information are contained in the data payload. In many areas of this new technology this is an important aspect of the protocol that developers want to test. To complicate matters, a single command can now span a large number of TLP packets. For a jammer, modifying a specific storage command or data element can mean having to capture a large number of TLP packets. Each TLP packet should be decoded, the extracted payloads assembled to create the indicated command and correctly identify the content of the target element before modification. Then the sequence should be reassembled to maintain the original packet structure and redirect it to the appropriate running link. Additionally, the jammer must maintain a copy of the PCIe environment to understand where the registered base address is located and identify the starting point of all registration. of transactions in the memory map to make it work,

The time it takes for a jammer to do this affects the overall latency of the transmission between the initiator and the destination. For these reasons, adding a jammer on a link affects Here is an example of a simple ATA command split between 48 separate TLP packets.

 

Best Way to Test PCIe Storage

 

Since PCIe data payloads are protected with sophisticated protocol protection mechanisms, any jammer tests are best directed at a subset of the PCIe link layer. To obtain a complete test, a PCIe protocol exerciser must be an integral part of the test setup. Using a PCIe exerciser allows users to create a comprehensive test covering command and storage data as well as a complete test of the PCIe link and transaction layers.

An exerciser can emulate a host disk or device in a storage system. This allows the SUT to be tightly controlled during testing without affecting its performance or the ACK latency requirements of a PCIe link. 

Host or device emulation is an important part of PCIe testing and validation of the last three generations of PCIe devices. The exercisers now support the features of the new PCIe storage interfaces, including queue management, doorbell registers, and NVMe/SCSI/ATA commands.

Device drivers are available allowing full emulation on systems to test BIOS behavior with consideration for error handling. Exercisers work through predefined scripting and control methods. Each method can quickly give test results that meet the needs of the test plan.

To properly configure your PCIe test system it is important to select which test tools best cover the area of ​​interest.

Cost is an important factor in the test equipment selection process to obtain the maximum test capacity for the investment made. Modern PCIe validation solution design and test labs incorporate protocol analyzers and exercisers into their systems for the maximum possible test coverage. Protocol analyzers are used to monitor the bus while multiple exercisers offer interoperability and stress testing. PCIe jammers are not considered valid due to their limited ability to test high-speed SSDs and other PCIe-based storage devices.


Tags
storage peripherals