Jamie Caramanica is the software architect lead at DISCO.
When we created our Excel viewer in DISCO Ediscovery, we wanted users to feel like they were using Microsoft Excel. Though spreadsheets are seemingly simple, there is a mile of complexity under the surface, which made for some pretty significant challenges. While there are some third-party components that help with handling Excel files, we found none that could provide all the capabilities we wanted to provide users for a truly magical experience. So we customized a third-party component heavily to achieve our goals around ease of use, performance, and user capabilities.
Since we launched the native Excel viewer in DISCO Ediscovery, our users have asked for a similarly intuitive way to redact Excel files natively instead of redacting on the image/PDF of an Excel file. Most ediscovery platforms either entirely lack this capability or make it cumbersome and expensive to use. So, our latest project has been designing and developing the capability to redact Excel files in a native-like experience. Here’s a technical look at how we created optimal performance, solved Excel challenges, and ensured perfect productions with this new feature.
Performance, performance, performance
One of the hallmarks of DISCO Ediscovery is performance, and the native Excel viewer had to keep those high standards. This is where the challenges of a web application come into play. Frame rate (frames per second) is a measure of a website’s perceived responsiveness As a user redacts and draws boxes on the document, we needed to ensure that the page continued to be responsive — especially when the user was scrolling down the page.
One example of this is the “refresh rate” (a measurement similar to frame rate) of premium smartphones like the iPhone 12). When using a premium smartphone, you will notice just how smooth the display is when you are scrolling. This is because it has a higher refresh rate, or frame rate, than other phones (e.g. 120 Hz. vs. 60 Hz.)
To ensure a pleasant user experience, the redactions are actually overlaid on top of the spreadsheet viewer component. But, once that was done, we had to architect a way to make it seem like the redaction was being done inside the component. We want to have the user select cells, rows, or columns and the redaction would need to snap to the correct edges of the range. If the user resizes a column or row that contains redactions, those redactions need to adapt smoothly to the new size.
There are also other objects to deal with, such as images and charts. The user can display or hide these objects, and redactions on the cells below those objects need to be layered correctly. Images can be redacted as well, so the image redactions need to follow along when the images are being displayed or hidden.
Now imagine navigating through hundreds of spreadsheets, each spreadsheet with dozens of sheets, and each sheet with dozens of redactions overlaid. We architected our Excel viewer to maintain our hallmark doc-to-doc rendering speed, even with this large amount of computational and data load. To show how serious DISCO is with ensuring speed, there are nightly automated tests that calculate the frame rate when applying redactions and scrolling to ensure no code is added that could adversely impact this experience.
Interesting UI challenges
Want to know just how much thought went into native Excel redactions? Here are a few examples of the challenges the team faced.
When an entire row or column is redacted, our design team came up with a great interface so that the user would always be able to see the redaction reason at the top left of the redaction and interact with the edit/delete icon of the redaction on the far right. Of course, the view changes as the user scrolls through the document, either horizontally or vertically. Ensuring the proper placement and flow of this component required a lot of engineering.
During the thorough tests that DISCO does for every feature, we tried to redact as many different types of Excel files as possible. One sticking point was merged cells. What should happen if you redact a column and there is a merged cell that is only partially in that column? How should selection work when merged cells are in play? What is the expectation of the produced, redacted PDF where merged cells are involved? We decided a merged cell should only be redacted if it is explicitly selected and redacted or completely covered when selecting a range, column(s), or row(s). Thus, the UI needs to show the user what will happen, especially when the user is trying to select multiple cells. Due to rigorous testing, we were able to uncover this gap and implement a high-quality solution that all users should appreciate.
At first, you would think that unselecting or unredacting would be pretty easy. But, then you start trying to implement it. Say there is a range of cells redacted. The user then selects three different, non-contiguous cells inside that range and clicks “unredact.” What should happen? First, you need to determine the mathematical logic on how to break up that one redaction into separate ones that would make sense to the user. Then, you have to figure out the edge cases that make it hard, such as a merged cell being right where two different redactions would’ve been created — which is a no-go. Even the simple fact of selecting a range and then Ctrl/Cmd clicking to “unselect” a cell, the third-party component that we use doesn’t handle unselecting. So, we had to architect a solution around that gap.
Ensuring perfect productions
Productions came with their own set of design challenges — with even more pressure to make things easy and intuitive in a high-stakes situation. For the most part, the way that DISCO produces a PDF for a spreadsheet that was redacted in the native viewer is similar to how a user would redact in Excel itself and then save to PDF. Here are some of the challenges we addressed.
Many of the challenges came around how things would look visually when creating the PDF. For example, say a user redacts a cell with a redaction reason that is bigger than the cell. To identify this issue, for every redaction, we need to calculate the font size of that cell and how big the redaction reason is. If the text is bigger than the cell, a symbol needs to be used and a dagger legend has to be shown — which then impacts the size of what can be shown on the page.
Another nicety comes into play when a user redacts a column that would span many pages when printing to PDF — which is one of the top reasons to redact in a native viewer. A user would want to have the redaction reason at the top of every printed page. So the print area needs to be calculated for every section and then adjustments need to be made so that the redaction reason is shown for each page in the correct location.
Page counts and Bates numbering
Another complex change involved the possibility that the PDF produced from a natively redacted spreadsheet could have a different page count than the PDF that is created for use in the PDF viewer. We needed to adjust the way that productions were constructed and the order in which Bates numbers were allocated so that every page would be accurate. For example, a cell with overflow text on one sheet could cause multiple pages to be created when the Excel file was saved to PDF. But, if that cell is redacted, the user wouldn’t want to have empty redacted pages, since there is no reason to have those extra pages anymore.
Designing with intent
Being an engineer at DISCO always brings intriguing challenges — which is why we love working here! With DISCO’s focus on the macro and micro of a feature, we are able to implement one of the best user experiences possible. As a team, we are able to stand proudly behind the features we implement — and redacting a spreadsheet in the native Excel viewer is now a wonderful addition to DISCO Ediscovery.
Watch Jamie talk about some of these engineering challenges with native Excel redactions