Which of the following acronyms represents the process by which data from multiple source systems?

Introduction to Data Virtualization

Inhaltsverzeichnis Show

Introduction to Data Virtualization
1.5.3 Data Virtualization versus Data Federation
Semantic Web application architecture
Data Federation
P/FDM Mediator for a Bioinformatics Database Federation
9.1.4 Mediator Architecture
The Future of Data Virtualization
13.5.2 Beyond Looking Under the Hood
Data Requirements Analysis
Develop Source-to-Target Mapping
Strategy, Scope, and Approach
Data Integration and Synchronization
Data Virtualization
Use Cases for Data Virtualization
Data Virtualization Use Cases
When Not to do Data Virtualization
Patterns for emerging application integration scenarios: A survey
SOA and Mashups

Rick F. van der Lans, in Data Virtualization for Business Intelligence Systems, 2012

1.5.3 Data Virtualization versus Data Federation

Very often, the term data virtualization is associated with the term data federation. Some consider them synonyms, but others see data virtualization as an extended form of data federation. In this section we explain what the relationship is between these two and what the differences are.

Federation stems from the Latin words foedus and foederis, which both mean treaty, agreement, and contract. In most cases, if the term federation is used, it refers to combining autonomously operating objects. For example, states can be federated to form one country, or companies can operate as a federation.

According to businessdictionary.com, a federation is an organization that consists of a group of smaller organizations or companies that works to bring attention to issues that are of importance to all of its members. Each organization that comprises the federation maintains control over its own operations. For example, a group of small businesses in a related industry might form a federation in order to lobby the government for laws favorable to small businesses. And according to Merriam-Webster.com, a federation is an encompassing political or societal entity formed by uniting smaller or more localized entities. Each member (for example, an individual federate state) can operate independently as well.

If we apply this general explanation to the term data federation, it means combining autonomous data stores to form one large data store. From this, we can derive the following definition:

Data federation is an aspect of data virtualization where the data stored in a heterogeneous set of autonomous data stores are made accessible to data consumers as one integrated data store by using on-demand data integration.

This definition is based on the following concepts:

●

Data virtualization: Data federation is an aspect of data virtualization. Note that data virtualization doesn’t always imply data federation. For example, if the data in one particular data store have to be virtualized for a data consumer, no need exists for data federation. But data federation always leads to data virtualization because if a set of data stores is presented as one, the aspect of distribution is hidden for the applications.

●

Heterogeneous set of data stores: Data federation should make it possible to bring data together from data stores using different storage structures, different access languages, and different APIs. A data consumer using data federation should be able to access different types of database servers and files with various formats; it should be able to integrate data from all those data sources; it should offer features for transforming the data; and it should allow the data consumers and tools to access the data through various APIs and languages.

●

Autonomous data stores: Data stores accessed by data federation are able to operate independently—in other words, they can be used outside the scope of data federation.

●

One integrated data store: Regardless of how and where data is stored, it should be presented as one integrated data set. This implies that data federation involves transformation, cleansing, and possibly even enrichment of data.

●

On-demand data integration: This refers to when the data from a heterogeneous set of data stores is integrated. With data federation, integration takes place on the fly, and not in batch. When the data consumers ask for data, only then is data accessed and integrated. So the data is not stored in an integrated way but remain in their original location and format.

Now that we know what data federation and data virtualization mean, we can make the following statements. First, data federation always implies multiple data stores, whereas data virtualization does not. Second, data federation can be seen as an aspect of data virtualization.

Final note: Until a few years ago, all the products available for data virtualization were called data federation servers. The main reason was that these products were primarily developed to make multiple databases look like one. Nowadays they are called data virtualization servers. The reason for the name change is that the “new” data virtualization servers offer much more functionality than the older ones. For example, some of those new products even support data modeling, data cleansing, and database language translation. Occasionally, this name change does cause confusion. In this book, we use the term data virtualization server.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123944252000010

Semantic Web application architecture

Dean Allemang, Jim Hendler, in Semantic Web for the Working Ontologist (Second Edition), 2011

Data Federation

The RDF data model was designed from the beginning with data federation in mind. Information from any source is converted into a set of triples so that data federation of any kind—spreadsheets and XML, database tables and web pages—is accomplished with a single mechanism. As shown in Figure 4.2, this strategy of federation converts information from multiple sources into a single format and then combines all the information into a single store. This is in contrast to a federation strategy in which the application queries each source using a method corresponding to that format. RDF does not refer to a file format or a particular language for encoding data but rather to the data model of representing information in triples. It is this feature of RDF that allows data to be federated in this way. The mechanism for merging this information, and the details of the RDF data model, can be encapsulated into a piece of software—the RDF store—to be used as a building block for applications.

The strategy of federating information first and then querying the federated information store separates the concerns of data federation from the operational concerns of the application. Queries written in the application need not know where a particular triple came from. This allows a single query to seamlessly operate over multiple data sources without elaborate planning on the part of the query author. This also means that changes to the application to federate further data sources will not impact the queries in the application itself.

This feature of RDF applications forms the key to much of the discussion that follows. In our discussion of RDFS and OWL, we will assume that any federation necessary for the application has already taken place; that is, all queries and inferences will take place on the federated graph. The federated graph is simply the graph that includes information from all the federated data sources over which application queries will be run.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123859655100044

P/FDM Mediator for a Bioinformatics Database Federation

Graham J.L. Kemp, Peter M.D. Gray, in Bioinformatics, 2003

9.1.4 Mediator Architecture

The role of the mediator is to process queries expressed against the federation's integration schema (CM). The mediator holds meta-data describing the integration schema and also the external schemas of each of the federation's data resources (ER). In P/FDM, these meta-data are held, for convenience of pattern matching, as Prolog clauses compiled from high-level schema descriptions.

The architecture of the P/FDM Mediator is shown in Figure 9.6. The main components of the mediator are described in the following paragraphs.

FIGURE 9.6. Mediator architecture. The components of the mediator are shown inside the dashed line.

The parser module reads a Daplex query (Daplex is the query language for the FDM), checks it for consistency against a schema (in this case the integration schema), and produces a list comprehension containing the essential elements of the query in a form that is easier to process than Daplex text (this internal form is called ICode).

The simplifier's role is to produce shorter, more elegant, and more consistent ICode, mainly through removing redundant variables and expressions (e.g., if the ICode contains an expression equating two variables, that expression can be eliminated, provided that all references to one variable are replaced by references to the other), and flattening out nested expressions where this does not change the meaning of the query. Essentially, simplifying the ICode form of a query makes the subsequent query processing steps more efficient by reducing the number of equivalent ICode combinations that need to be checked.

The rule-based rewriter matches expressions in the query with patterns present on the left-hand side of declarative rewrite rules and replaces these with the right-hand side of the rewrite rule after making appropriate variable substitutions. Rewrite rules can be used to perform semantic query optimization. This capability is important because graphical interfaces make it easy for users to express inefficient queries that cannot always be optimized using general purpose query optimization strategies. This is because transforming the original query to a more efficient one may require domain knowledge (e.g., two or more alternative navigation paths may exist between distantly related object classes but domain knowledge is needed to recognize that these are indeed equivalent).

A recent enhancement to the mediator is an extension to the Daplex compiler that allows generic rewrite rules to be expressed using a declarative high-level syntax [20]. This makes it easy to add new query optimization strategies to the mediator.

The optimizer module performs generic query optimization.

The reordering module reorders expressions in the ICode to ensure that all variable dependencies are observed.

The condition compiler reads declarative statements about conditions that must hold between data items in different external data resources so these values can be mapped onto the integration schema.

The ICode rewriter expands the original ICode by applying mapping functions that transform references to the integration schema into references to the federation's component databases. Essentially the same rewriter mentioned previously is used here, but with a different set of rewrite rules. These rewrite rules enhance the ICode by adding tags to indicate the actual data sources that contain particular entity classes and attribute values. Thus, the ICode rewriter transforms the query expressed against the CM into a query expressed against the ER of one or more external databases.

The crucial idea behind the query splitter is to move selective filter operations in the query down into the appropriate chunks so they can be applied early and efficiently using local search facilities as registered with the mediator [KIG94]. The mediator identifies which external databases hold data referred to by parts of an integrated query by inspecting the meta-data, and adjacent query elements referring to the same database are grouped together into chunks. Query chunks are shuffled and variable dependencies are checked to produce alternative execution plans. A generic description of costs is used to select a good schedule/sequence of instructions for accessing the remote databases.

Each ICode chunk is sent to one of several code generators. These translate ICode into queries that are executable by the remote databases, transforming query fragments from ER to CR. New code generators can be linked into the mediator at runtime.

Wrappers deal with communication with the external data resources. They consist of two parts: code responsible for sending queries to remote resources and code that receives and parses the results returned from the remote resources. Wrappers for new resources can be linked into the mediator at runtime. Note that a wrapper can only make use of whatever querying facilities are provided by the federation's component databases. Thus, the mediator's conceptual model (CM) will only be able to map onto those data values that are identified in the remote resource's conceptual model (CR). Thus, queries involving concepts like gene and chromosome in CM can only be transformed into queries that run against a remote resource if that resource exports these concepts.

The result fuser provides a synchronization layer, which combines results retrieved from external databases so the rest of the query can proceed smoothly. The result fuser interacts tightly with the wrappers.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781558608290500115

The Future of Data Virtualization

Rick F. van der Lans, in Data Virtualization for Business Intelligence Systems, 2012

13.5.2 Beyond Looking Under the Hood

The trick is to gain competitive advantage by accelerating the delivery of critical data and reports and to be able to trust and consume them instantly. But data virtualization must be done right to support the critical success factors. Very often, data virtualization borrows heavily from its data federation legacy. The primary use case of data federation is to access and merge already cleaned and conformed data in real-time, leaving the heavy lifting for other processing logic to make this possible. So the time advantage gained is lost as one realizes the federated data had to be prepped for federation. As a result, the ROI simply disappears.

So do go beyond looking under the hood, and ask a few hard questions. To what extent does the solution support data transformation? Is it nominal, limited to what you can do programmatically through SQL or XQuery? Is there any data profiling, or will you require staging and further processing? Is it profiling of both logic and sources, just sources, or neither? Are data cleansing and conforming simplistic, hand-coded, or nonexistent? How about reuse? Can you quickly and easily reuse virtual views for any use case, including batch? To do data virtualization right, it requires a deep and thorough understanding of very complex problems that exist in the data integration domain.

So what’s our perspective? Simply put, data virtualization must take a page from the world of virtual machines. Data virtualization must do the heavy lifting of accessing, profiling, cleansing, transforming, and delivering federated data to and from any application, on-demand. It must handle all the underlying data complexity in order to provide conformed and trusted data, reusing the logic for either batch or real-time operation, whether through SQL, SOA, REST, JSON, or new acronyms yet to be specified. Data virtualization must be built from the ground up on the cut the wait and waste best practices discussed in the book Lean Integration by Schmidt and Lyle (see [65]).

By starting with a logical data model; giving business and IT role-based visibility to the data early in the process; enabling data profiling on federated data to show and resolve the data integrity issues; applying advanced transformations, including data quality in real-time to federated data; and completely and instantly reusing the integration logic or virtual views for batch or real-time, you can cut the wait and waste throughout the process. By leveraging optimizations, parallelism, pipelining, identity resolution, and other complex transformational capabilities that you can find only in a mature data integration platform, data virtualization can enable more agile business intelligence.

Finally, with enterprises generating huge volumes of data, the types of data changing enormously, and the need for shorter data processing speeds, data virtualization can maximize the return on data. You can ensure that immediate action is taken on new insights derived from both big data and existing data. You can combine on-premise data with data in the cloud, on-demand. With the world becoming more mobile, you can provide access to disparate data by provisioning it to any device. Done right, data virtualization can give you the agile data integration foundation you need to embrace what we call secular megatrends: social, mobile, cloud, and big data.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123944252000137

Data Requirements Analysis

David Loshin, in Business Intelligence (Second Edition), 2013

Develop Source-to-Target Mapping

The objective of this process is to both identify the data elements in source data systems that are potentially incorporated into analytical processing and understand any requirements for transformations or alignments to be performed. These modifications may be needed to retain semantic alignment when data sets are merged from multiple sources. At the same time, the analysts must determine the level of atomic data needed for drill-down to satisfy downstream user needs for more precise investigation or analysis.

These transformations will specify how upstream data elements are modified for downstream consumption, as well as business rules applied as part of the information flow. This process will also help link source data elements to shared reference metadata that can be standardized for analysis and for presentation purposes.

Figure 7.4 shows the sequence of these steps:

Figure 7.4. Source-to-target mapping.

Propose target models for extraction and sharing. Evaluate the catalog of identified data elements and look for those that are frequently created, referenced, or modified. By considering both the conceptual and the logical structures of these data elements and their enclosing data sets, the analyst can identify potential differences and anomalies inherent in the metadata, and then resolve any critical anomalies across data element sizes, types, or formats. These will form the core of a data sharing model, which represents the data elements to be taken from the sources, potentially transformed, validated, and then provided to the consuming applications. These models may evolve into a unified view for virtualized data services in a data federation environment.

Identify candidate data sources. Consult the data management teams to review the candidate data sources containing the identified data elements, and review the collection of facts needed by the reporting and analysis applications. For each fact, determine if it corresponds to a defined data concept or data element, if it exists in any data sets in the organization, if it is a computed value (and if so, what are the data elements that are used to compute that value), and then document each potential data source.

Develop source-to-target mappings. Since this analysis should provide enough input to specify which candidate data sources can be extracted, the next step is to consider how that data is to be transformed into a common representation that is then normalized in preparation for consolidation. The consolidation processes collect the sets of objects and prepare them for populating the consuming applications. During this step, the analysts enumerate which source data elements contribute to target data elements, specify the transformations to be applied, and note where it relies on standardizations and normalizations revealed during earlier stages of the process.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123858894000077

Strategy, Scope, and Approach

Mark Allen, Dalton Cervo, in Multi-Domain Master Data Management, 2015

Data Integration and Synchronization

It’s been stated already that the fundamental issue MDM addresses is the duplication, fragmentation, and inconsistency of data across multiple sources behind systems that are continuously operating in silos. Since data integration involves combining data from several sources, it becomes a required core competency in MDM programs.

There are many flavors of data integration, but they can be largely placed into either the physical or the virtual category. Physical data integration implies making a copy of the data, while virtual integration doesn’t. An enterprise data warehouse is a typical physical data integration implementation, while data federation is a technique to virtually integrate multiple data sources. In a pure sense, data migration doesn’t necessarily mean data integration because you could be moving data around without truly integrating any data. However, if data migration is in the context of an MDM program, it will certainly carry physical data integration aspects with it.

Data synchronization is the process to establish consistency among systems and ensuing continuous updates to maintain consistency. According to this definition, data synchronization presupposes an ongoing integration effort. Therefore, data synchronization is the continuing realization of a data integration activity. Because of its maintenance characteristics and associated costs and risks, data synchronization has a huge influence on the selection of an MDM architecture. This consideration will be expanded in Chapter 3, where multi-domain technologies are explored, as well as in Chapter 8, as part of a discussion of data integration.

When looking to leverage data integration and synchronization across multiple domains, it is necessary to remember the difference between their definitions: data synchronization implies ongoing data integration, while data integration in itself doesn’t have the reoccurring aspect to it. This matters because domains will have different requirements about how data is brought together, as well as how often they need to be kept in sync when data is replicated.

As with all other functions, the more mature a company becomes in one particular area, the more capable and nimble it will be to reapply related knowledge, experiences, methods, processes, and procedures in successive domains. With that said, certain aspects of data integration and data synchronization can indeed be generic for multiple domains, while others are not so much. It also depends very heavily on the type of the MDM implementation. For example, if one domain is implemented using a federated system, there is a need for initial data integration to achieve proper data linkage, but there is no need for data synchronization. There are many nuances that will be covered in a lot more detail later in this book. Generically, however, there is always a need for integration since the starting point is discrepant sources, but there may not always be a need for synchronization since a single copy might exist after data is integrated into a multi-domain MDM hub.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128008355000014

Data Virtualization

William McKnight, in Information Management, 2014

Use Cases for Data Virtualization

Given the heterogeneous information management architecture, the goal of eliminating unnecessary redundancy, and the capabilities of data virtualization, we land data in its best spot to succeed and go from there.

Data virtualization is not a materialized view, which is always a physical structure.

Data Virtualization Use Cases

Composite Software, a prominent data virtualization vendor and part of Cisco Systems, organizes the data virtualization use cases as follows:

•

BI data federation

•

Data warehouse extensions

•

Enterprise data virtualization layer

•

Big data integration

•

Cloud data integration

There is some obvious overlap between these. For example, most data warehouses are built for business intelligence, so extending the warehouse virtually actually provides federation for BI. This form of virtualization is helpful in augmenting warehouse data with data that doesn’t make it to the warehouse in the traditional sense, but nonetheless is made available as part of the warehouse platform. Big data integration refers to the integration of data in Hadoop, NoSQL systems, large data warehouses and data warehouse appliances. Finally, the cloud is presented as a large integration challenge that is met by data virtualization.

Master Data Management

Master Data Management (MDM), discussed in Chapter 7, is built for governing data and distributing that data. The distribution of MDM data has a significant architectural aspect to it. MDM data does not have to be physically distributed to a similar structure residing in the target system that wants the data. Depending on the frequency of access and the concurrency requirements on the MDM hub itself, MDM data can stay in the hub and be joined to data sets far and wide in the No-Reference Architecture. Ultimately, MDM will be the highest value usage for data virtualization.

When the structure you wish to join MDM (relational) data with is not relational, you may create a separate relational store for use with the nonrelational data (which would still necessitate data virtualization) or you can utilize the main MDM hub for data virtualization.

Data virtualization is not synchronization, which is keeping two separate data stores consistent in data content without much time delay.

MDM data can increase the value of Hadoop data immensely. Hadoop is going to have low granularity transaction-like data without information, other than perhaps a key, about company dimensions like customer. Those transactions can be analyzed “on the face” of the transaction, which has some value, but the value of bringing in information not found in the transaction—but relevant to it—is much more valuable.

If analyzing people’s movements across the store based on sensor devices, while it’s helpful to know a generic person’s pattern, it is more helpful to know it is Mary Smith, who lives at 213 Main Street (an upper scale geo), has a lifetime value that puts her in the second decile, has two kids, and prefers Nike clothing. You can design the store layout based on the former, but you can make targeted offers, too, based on the latter.

A similar analogy applies to MDM and Data Stream Processing (Chapter 8), which has to do with real-time data analysis. Analyzing the stream (by human or machine) along with the dimensional view provided by MDM means you can customize the analysis to the customer, product characteristics, and location characteristics. While a $10,000 credit card charge may raise an alert for some, it is commonplace for others. Such limits and patterns can be crystalized in MDM for the customers and utilized in taking the next best action as a result of the transaction.

Data virtualization cannot provide transactional integrity across multiple systems, so data virtualization is not for changing data. It is for accessing data.

Because data streams are not relational, the MDM hub could be synchronized to a relational store dedicated to virtualization with the stream processing or the stream processing could utilize the MDM hub. The decision would be made based on volume of usage of the hub, physical proximity of the hub, and concurrency requirements to the hub.

Adding numerous synchronization requirements to the architecture by adding numerous hubs can add undue overhead to the environment. Fortunately, most MDM subscribers today are relational and have a place in the data store for the data to be synchronized to.

Mergers and Acquisitions

In the case of a merger or acquisition (M&A), immediately there are redundant systems that will take months to years to combine. Yet also immediately there are reporting requirements across the newly merged concept. Data virtualization can provide those reports across the new enterprise. If the respective data warehouses are at different levels of maturity, are on different database technologies or different clouds, or differ in terms of being relational or not, it does not matter. Obviously, these factors also make the time to combine the platforms longer, and possibly not even something that will be planned.

Data virtualization has the ability to perform transformation on its data, but—as with data integration—the more transformation, the less performance. With virtualization happening at the time of data access, such degregations are magnified. Approach data virtualization with heavy transformation and CPU-intensity with caution.

I am also using M&A as a proxy for combining various internal, non-M&A based fiefdoms of information management into a single report or query. While the act of an M&A may be an obvious use case, companies can require M&A-like cross-system reporting at any moment. This is especially relevant when little information management direction has been set in the organization and a chaotic information environment has evolved.

Temporary Permanent Solution

The need to deliver business intelligence per requirements may outweigh your ability to perform the ETL/data integration required to physically commingle all the data required for the BI. Any data integration requirements are usually the most work intensive aspect of any business intelligence requirement.

As you set up the information management organization for performing the two large categories of work required—development and support—you will need to consider where you draw the line. Development is usually subjected to much more rigorous justification material, rigorous prioritization, and project plans or agile set up. Support is commonly First In, First Out (FIFO), queue-based work that is of low scope (estimated to be less than 20 person-hours of effort).

Having run these organizations, I’ve become used to doing quick, and pretty valid, estimations of work effort. I know that when the work involves data integration, it likely exceeds my line between development and support, so I put that in the development category. Unfortunately, now that we can no longer rely on a row-based, scale-up, costly data warehouse to meet most needs, quite often we’ve placed data in the architecture in disparate places.

With data virtualization, the business intelligence can be provided much more rapidly because we are removing data integration. However, the performance may suffer a bit from a physically coupled data set.

If you look at 100 BI requirements, perhaps 15 will be interesting in 6 months. There are 85 requirements that you will fulfill that serve a short-term need. This does not diminish their importance. They still need to be done, but it does mean they may not need to be quite as ruggedized. One study by the Data Warehousing Institute2 showed the average change to a data warehouse takes 7–8 weeks.

At Pfizer, cross-system BI requirements will be met initially through data virtualization in the manner I described. After a certain number of months, if the requirement is still interesting, they will then look into whether the performance is adequate or should be improved with data integration.

You need data virtualization capability in the shop in order to compete like this. Talk about being agile (Chapter 16)! If you accept the premise of a heterogeneous environment, data virtualization is fundamental. I would also put cloud computing (Chapter 13), data governance, and enterprise data integration capabilities in that category.

Stand-alone virtualization tools have a very broad base of data stores they can connect to, while embedded virtualization in tools providing capabilities like data storage, data integration, and business intelligence tend to provide virtualization to other technology within their close-knit set of products, such as partners. You need to decide if independent virtualization capabilities are necessary. The more capabilities, the more flexibility and ability to deliver you will have.

Simplifying Data Access

Abstracting complexity away from the end user is part of self-service business intelligence, discussed in Chapter 15. Virtual structures are as accessible as physical structures when you have crossed the bridge to data virtualization. While the physical structures and data sources, so are virtual structures. This increases the data access possibilities exponentially.

Data virtualization also abstracts the many ways and APIs to access some of the islands of data in the organization such as ODBC, JDBC, JMS, Java Packages, SAP MAPIs, etc. Security can also be managed at the virtual layer using LDAP or Active Directory.

When Not to do Data Virtualization

There may be some reports that perform well enough and are run infrequently enough that it may make sense to virtualize them, but there is another perspective and that is auditability. Going back to a point in time for a virtual (multi-platform) query is difficult to impossible. It requires that all systems involved keep all historical data consistently. I would not trust my Sarbanes–Oxley compliance reports or reports that produce numbers for Wall Street to data virtualization. Physicalize those.

Similarly, when performance is king, as I’ve mentioned, you’ll achieve better performance with physical instantiation of all the data into a single data set. Mobile applications in particular (covered in Chapter 15) need advanced levels of performance, given the nature of their use.

While I have somewhat championed data virtualization in this chapter, these guidelines should serve as some guardrails as to its limitations. It is not a cure-all for bad design, bad data quality, or bad architecture.

Combining with Historical Data

At some point in time, in systems without temperature sensitivity that automatically route data to colder, cheaper storage, many will purposefully route older data to slower mediums. The best case for this data is that it is never needed again. However, it is kept around because it might be needed. If that time comes, the data is often joined with data in hot storage to form an analysis of a longer stretch of data than what is available there. The cross-platform query capabilities of data virtualization are ideal for such a case.

This also serves as a proxy for the “odd query” for data across mediums. This is the query that you do not want to spend a lot of time optimizing because it is infrequent. The users know that and understand.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124080560000096

Patterns for emerging application integration scenarios: A survey

Daniel Ritter, ... Stefanie Rinderle-Ma, in Information Systems, 2017

SOA and Mashups

Liu et al. combine several common architecture integration patterns, namely Pipes and Filters, Data Federation, and Model-View-Controller to compose enterprise mashups [26]. Moreover, these patterns are customized for specific mashup needs. In [27] enterprise architecture integration patterns (e. g., Pipes and Filter, Data Federation, Model-View-Controller) are leveraged in order to compose reusable mashup components. The authors also present a service oriented architecture that addresses reusability and integration needs for building enterprise mashup applications. The proposed solutions focus on SOA and mashups, but no solution to EIP and new trends is provided. The work by Braga et al. addresses issues of complexity of service compositions with adequate abstraction to give end users easy-to-use development environments [28]. Abstract formalisms must be equipped with suitable runtime environments capable of deriving executable service invocation strategies. The solution tends towards mashups and modeling as users declaratively compose services in a drag-and-drop fashion while low-level implementation details are hidden. However, the solution could not be clearly identified and is hence not included in Table 1. Finally, Cetin et al. chart a road map for migration of legacy software to pervasive service-oriented computing [29]. Integration takes place even at the presentation layer. No solution is provided for EIP and trends, however, mashups are used as migration strategy to SOA based for the Web 2.0 integration challenge.

Read full article

URL: https://www.sciencedirect.com/science/article/pii/S0306437917301084