Today’s enterprise challenges demand near real-time access to data. While that seems to be a must, the application ecosystem continues to present a growing array of data persistence options, such as relational databases, NoSQL (not only SQL) databases, on-premise data warehouses, in-memory data stores — the list goes on.
Having built a career architecting and developing data integration strategies for enterprises, I am aware of the benefits of maintaining a local copy of data, but I also recognize that those also possesses considerable challenges, including:
It’s hard to understand your data sources.
• There’s an undeniable lag in data movement.
• You need more lead time in designing and building data integration patterns.
• You also need infrastructure availability.
Depending on the robustness of the implementation and operationalization of your organization, the degree of the challenges mentioned above may lead to varied outcomes. My recent experience with data virtualization presents a unique approach and alternative solution to this problem depending on use cases.
What Is Data Virtualization?
Data virtualization (DV) is a data access platform that aggregates disparate data sources to create a single version of the data set for consumption. It provides a unified, abstracted, organized and encapsulated view of the data coming from similar or heterogeneous data sources while the data remains in source systems.
Data virtualization addresses the data movement challenge by ensuring data remains at the source — yet is also available for consumption in real-time for consuming applications. Its data collaboration approach allows an application to retrieve data as a single view component without the user requiring its technical details, such as its physical location, source formatting information, security parameters, configuration settings, etc. This platform substitutes extract-transform-loads (ETLs) and data warehousing in areas such as business intelligence and analytics, application development and big data consumption.
How Enterprise Data Virtualization Can Help Your Business
Data virtualization integrates data from diverse sources, locations and formats — without replication. A single “virtual” data layer is created in a process that delivers unified data services to support multiple applications and users while providing:
- Faster access to data, with nearly zero lag
- Significantly less lead time for designing and implementation for data availability
- Less data redundancy
- More agility to change
According to the Data Management Book of Knowledge: “data virtualization enables distributed databases, as well as multiple heterogeneous data stores, to be accessed and viewed as a single database. Rather than physically performing ETL on data with transformation engines, data virtualization servers perform data extract, transform and integrate virtually.”
Data Virtualization’s Role In Your IT Department
Data virtualization capabilities allow IT departments to implement technology in their core business strategy to several benefits, including the following:
- Zero Replication: Integrated views of data draw from multiple sources without moving or replicating it, bypassing redundant copies and reducing storage footprints.
- Abstraction: Data is accessed without its location or configuration information.
- Real-Time Access: The latest version of data is instantly available.
- Agility: DV facilitates a layer that is universally semantic for several consumer applications, eliminating enterprise computing disruptions.
- Logical Abstraction And Decoupling: DV connects distinct data sources, middleware and platform-specific consumer applications and their interfaces, formats, schema, security protocols and query paradigms.
- Semantic Integration Of Data: It bridges the semantic understanding of unstructured and web data with a schema-based approach.
- Agile Data Services Provisioning: Primary, derived, integrated or virtual data sources can be securely and quickly accessed through a different format or protocol than the original.
- Unified Data Governance And Security: All data, from source to output data services, is discoverable and cohesive through a single virtual layer, which exposes redundancy and quality issues and achieves consistent integration.
Preparing for Data Virtualization
There are several ways to get started on your virtualization project. Consider these three:
- Know Your Datasets
While you may be in total sync with your fundamental processing systems, you may not be so comprehensive about peripheral or edge computing systems that generate their own form of data — that could, and probably will, enhance your analytics capacities. Create an inventory of all your datasets, their locations and their management requirements; each one will need a specific translator to ensure they engage fully with the virtualization layer.
- Know Your Data
There are also aspects of your business that collect or generate larger volumes of data at faster paces than others. Ensuring that your virtualization layer can access the most current data in these swift-changing bases may require special preparation or translation capacities in the virtual layer. Parse these out before launching the project in general.
- Understand Your Operations
It may be easiest to ‘virtualize’ those operations that will provide the best and fastest return on the EDV investment. Identify those elements that would respond fastest to improved data analysis and considering starting your project virtualizing those databases.
Implementing Your Solution
Now you’re ready to implement your data virtualization project. To start, you can try one of the following approaches:
- Data Blending: A DV solution can merge with your business intelligence (BI) tool’s semantic universe layer or can be added as a new module, combining multiple data sets to feed BI-specific enterprise tools.
- Data Services Module: Typically offered by a data integration suite or data warehouse vendors, this platform delivers robust data modeling, transformation and quality functions.
- SQLification Products: This emerging offering “virtualizes” underlying big data technologies and allows them to be combined with relational data sources and flat files — with querying done using standard SQL.
- Cloud Data Services: Data virtualization has prepackaged integrations for SaaS (software as a service) and other cloud architectures, databases and many on-premise tools like Excel that are implemented on private enterprise clouds. These products expose normalized APIs across cloud sources for easy data exchange in projects of medium-sized data sets.
In the end, data virtualization’s location-transparent, built-in architecture, coupled with large-scale analytics architectures, naturally supports applications in a hybrid cloud environment. It goes beyond tiered views and delegable query execution to offer enterprise growth. Overall, implementing your own data virtualization approach will let you derive information faster.
POST WRITTEN BY
Director, Digital Modernization | Principal Architect | Technology Evangelist for Sage IT Inc.