A Data Story: An Exponam Origin Tale

Data moves.  A lot.

Intra-company.  Inter-company.  For analysis.  For distribution and sale.  For teams working from home.  For partners, vendors, regulators.  To the cloud.  From the cloud.

Downloading, moving, and uploading data is slow.  Firms address inefficient data movement in two ways:

  • By centralizing data in the cloud and providing pointers with access rather than moving the data. For use cases in which data is not needed in other locations, this is a good solution.
  • By working to improve bandwidth, limit packet loss, reduce distances – all to speed data transmission. Yet even when transmission is optimized, time to extract and import data is a constraint.

Exponam Origins

Unsatisfied with current solutions to speed data sharing and distribution, we sought a new approach.  Rather than addressing the issue of time to move a single bit of data, we focused on the data itself.  What could we do to decrease the size of the package, and make it more efficient for extract and import?

Today, data is transferred in different formats – csv, json, xml – but these formats all suffer the same fundamental flaws of being large and inefficient for extract and import.  These formats have been used for generations – from a time when data sets were much smaller.  From a time when a few million rows was a tremendous amount of data.

Exponam .BIG

At Exponam, we have created a new data file type – tailor made for today’s data sets.  Our data file, a .BIG file, is highly compressed and is optimized for efficient extract and import.  It can be used to transport, share, and explore hundreds of millions of rows quickly and easily.  In tests, a .BIG file is blazingly fast when transferring ultra-large data sets – transferring in minutes, data that took hours or days in other formats.

 

Accessing Data

Once files have been distributed and shared, we need an easy way to access and explore data. With the Exponam Explorer, a user can instantly open, filter, sort, and find data from within a .BIG file.

It is easy to explore data of any size – hundreds of rows or millions of rows – making this a great solution when data won’t fit in an Excel spreadsheet.

Users no longer need to spend hours migrating data to databases and writing queries – data is available instantly in a spreadsheet.  Alternately, users can query .BIG files directly via JDBC – with the query performance of a database.

 

Data Security

Quickly transferring and accessing data aren’t the only data problems.

The world has a major problem with data security.  We are constantly learning about yet another company which experienced a major data breach.  Not only is data stolen from databases, it is stolen from data files extracted from databases.  Extracted data files are not secure.  And they are everywhere.

So we made .BIG files fully secure – exceeding today’s security demands around information.  And they stay secure.  You never decrypt/decompress .BIG like with other compression options.  Exponam .BIG files are secure at rest, in process, and in transit.

Data files sitting in email, on laptops, and in the cloud are a security risk as long as they exist.  A file which was downloaded or shared one day can be compromised years later.  Exponam .BIG files can be generated with specified durations for file access – from hours to days or years.  Even more, .BIG files can be dynamically controlled – enabling user specific entitlements and access rights.

.BIG files are tamper-proof and their provenance is guaranteed.  The publisher is certifiable and both the file properties and data are unalterable.

 

About Us

At Exponam, we evangelize “Empowerment through Data.”  We have created the Exponam ExplorerTM, Exponam BuilderTM, and Exponam .BIGTM file format to enable Secure Sharing and Exploring of Data.  Learn more.  Visit us at www.exponam.com.

Download This Article.

The Myth of Analytics Self Service

 

THE BIGGEST LIE ANALYTICS VENDORS TELL: 
Our users do their analysis within our platform 

 

FACT:
Users download data – To send to external parties; for further analysis; to upload elsewhere. AND YOU provide the ability to download CSVs. But when faced with downloading more than a few thousand rows of data, you offer no solution. “User self-service” becomes “ask IT.” 

 

We understand: Downloading millions of rows is hard

  • Too large for a CSV
  • Can’t open the file in Excel or Sheets
  • Corporate security concerns

The answer is EXPONAM

With Exponam .BIG files, your users can easily download dataset with 100s of millions of rows.

  • Download .BIG files as easily as downloading CSVs
  • Access millions of rows instantly in a spreadsheet
  • Easily filter and sort data
  • One click push to Excel
  • Fully secure data
  • Ultra-compressed
  • Files are immutable, provenance and lineage are tracked
  • Files are stamped with originating system details – extending your reach to all who interact with the files

You do the heavy lifting –making data accessible; providing visualizations; uncovering insights.   Let us help you provide true user self-service for secure, large data download, sharing and exploration.

Download this article.

Exponam & Apache Spark

 

Exponam’s direct integration with Apache Spark, including Databrick’s commercial offering, improves the time-to-value of quantitative, analytic, and machine learning results available with Spark.  Exponam’s integration achieves results with these advantages:

  • A native data source for loading and saving Exponam .BIG filesThe native data source is built with Exponam’s powerful core technology, which dynamically tunes itself to your enterprise Spark clusters’ runtime capabilities.  This ensures lean execution, high performance, and brisk throughput to and from Spark’s internal RDD (resilient distributed data) structures.  With Exponam, you can ingest large datasets into Spark out of highly compressed import files without wasting space and time.  And you are able to egress Spark data into a format that is orders of magnitude more compressed than standard delimited formats, allowing much larger datasets to be faithfully preserved for audit and archival needs.
  • Frictionless access with Spark DataFrames

    Exponam data load and save operations are available using standard DataFrame syntax that data scientists use every day, whether with Scala, Python, or Spark SQL.  Exponam’s default options can be trivially overridden using standard DataFrame options, unleashing the full power of Exponam’s underlying technology: security, file optimization levels, story files, and application-defined supplemental metadata.An Exponam file can contain any number of tables, each with its own schema and row-level data.  Each table can be loaded individually, allowing a single Exponam file to transport entire rich repositories of data into Spark.  Exponam’s schemas eliminate the potential ambiguity of inferred schemas, and mean that the native representation of objects in RDDs is always optimal and correct.Further, Exponam’s save operation with Spark DataFrames allows the flexibility that DataFrame users demand.  Save can be invoked in a cluster-aware fashion, with each node in the cluster generating an output file for its local data only, which can be advantageous for extremely large RDDs.  Alternately, DataFrame results can be coalesced (or glom’ed) through the master node, and result in a single output file.  The point is that Exponam allows you to use the pattern that best fits your cluster profile and data egress requirements.
  • Data lineage

    Modern data architectures seek to preserve data lineage across disparate products and solutions, an almost insurmountable task when data is moved between traditional silos, compute grids, and data grids.  With Exponam, the provenance of data is integral to the file itself.  This allows solution architectures using Apache Spark to maintain data lineage from ingest through egress, so that the linkage to upstream systems is faithfully preserved.
  • SecurityStandard data exchange formats for Spark require data that is unencrypted when at rest.  Exponam, in contrast, is always encrypted at rest, even as it is being loaded into the cluster.  The attack surface for potential data breach is demonstrably smaller with Exponam.Further, Exponam’s default behavior on load operations is to first establish the integrity of the file.  If the file has been tampered with, it will fail with a standard Spark exception, and absolutely no row-level data will be generated in Spark.

 

Download this article: