Pentaho ETL

Pentaho ETL

Pentaho: ETL Data Integration

By: Fred Deese
Visual Connections President & CEO

In our current modern business culture of today, data forms and sources are vast and plentiful. If you think of CRMs, mobile apps, ERPs, etc., compiling and making use of information is easier said than done in many cases. However, you'd be hard-pressed to find anyone that disagrees that leveraging data strategically is vital to how businesses and agencies operate and grow.

In the past, businesses and agencies have leveraged data warehouses for a particular application or specific use. However, these data warehouses were limited to a specific schema. If data needed to be added for greater context or to advance source material, a lengthy manual process was necessary. We're talking sourcing new data, separately defining requirements, and a manual build process to update the data warehouse. Additionally, the process involved not breaking the existing code while updating the schema.

The great thing about evolution is that today automated data integration allows businesses and agencies the ability to gather unstructured, semi-structured, and structured data from virtually any disparate source into one place. What does that mean? It means businesses and agencies can gain more in-depth insight, actionable intelligence, make better-informed decisions, and improve performance measurements to support organizational objectives. Not to mention, from granular questions to high-priority concepts, data integration is applied to specific use cases that impact every department and team in your business and agency. In essence, this includes business intelligence, data enrichment, data quality, customer data analytics, and real-time information delivery.

Pentaho Data Integration (PDI) is a part of the Pentaho Open Source Business intelligence suite.

Pentaho Data Integration (PDI) is a part of the Pentaho Open Source Business intelligence suite.

A Modern Data Integration Solution

Modern data integration leverages data pipelines and an assortment of integrations to replace antiquated conventional methods of manual data set management. Take into consideration cloud-based data lakes and warehouse environments. You can now stream, deliver, and store the data you need in a centralized data warehouse like Amazon Redshift, for example.

So, let's discuss Pentaho as a solution. Pentaho has the unique ability to combine data integration and analytical processing. The combination means rapid results for your business or agency in terms of making intelligent business decisions. Simply put, this means a faster data visualization and results process, which gives way to reduced risk, more efficient operations, improved delivery quality, and potentially new revenue streams.

What is Pentaho?

Pentaho is an open-source business intelligence product that provides an array of business intelligence solutions. Pentaho encompasses the aspects of data integration (ETL), data analysis, reporting, and dashboards. Once implemented, the Pentaho platform allows businesses and agencies to gain access to a range of information, including sales analysis, reporting, finance analysis, customers, product profitability, and multifaceted information leveraged by top management. Let’s briefly break out the facets of Pentaho:

  • Analysis: analysis (based on Mondrian OLAP) that encompasses JPivot views, advanced SVG or Flash graphics, integrated dashboard widgets, workflow and portal integration, and data mining

  • Dashboard: reusable display widgets that are embeddable into applications, JSR-168 compliant portals, or JSPs

  • Data Mining: integrates Weka and provides intelligent data analysis, and machine learning algorithms – applied to tasks jointly with OLAP

  • Data Integration: realized by an ETL tool called Kettle or Spoon and provides the graphical user interface for data processing

  • Reporting: provides scheduled and on-demand report dissemination in popular industry formats

How Does Pentaho Work?

Pentaho's data solutions are great for businesses and agencies in the government, healthcare, financial services, and retail sectors. Known for its ease of use, Pentaho takes a metadata-driven approach. For administrators and ETL developers, we're talking data manipulation jobs without entering code, not a single line. Furthermore, Pentaho gives way to a shared repository, remote ETL execution, and simplifies the development process.

Here's a list of development tools for implementing the Pentaho ETL processes:

  • Spoon – graphical tool, performs data flow functions like reading, writing, validating, transforming, and refining data to an array of different data sources

  • Pan - dedicated to implementing transformation modeled in Spoon

  • Kitchen - performs jobs in a batch mode designed in Spoon

  • Carte - web server used for running and monitoring data integration tasks – allows for remote monitoring

Providing data integration, data mining, reporting, OLAP services, dashboarding, and ETL solutions, Pentaho is a one-stop-shop for all business analytics needs. Here are some additional benefits:

  • Created on java/j2ee platform making it highly customizable

  • Simple plug-in architecture

  • Supports deployment on single-node computers, cloud, or cluster

  • Cost-effective as compared to other tools available in market

  • Open Source – supports Big Data

  • Cloud Analytics

  • User-Friendly Interface

  • Real-time analysis – in-memory data caching

  • Integrates a range of data types including Hadoop, NoSQL, plus relational, operational stores, and analytical databases

  • Extensive library of interactive visualizations to find inconsistencies and patterns (e.g., scatter/bubble charts, geo-mapping, and heat grids)

Pentaho Business Analytics & AWS Solutions

A Pentaho and AWS marriage guarantee optimum data security and efficient storage. Leveraging Amazon's EBS encryption safeguards complete security and continuous support for data sets. Additionally, motion security exists between Amazon EC2 instances and EBS volumes. The EC2 firewall is effective in restricting access to data and databases. While utilizing the dynamic and flexible access control systems of Amazon tools allow businesses and agencies to track and monitor access to EBS volumes. Strong access controls and seamless encryption equal effective security strategies for key data sets.  

Furthermore, according to APN – Amazon Redshift Partners – Pentaho: 

“Pentaho has certified its business analytics and data integration platform to work with Amazon Redshift. Customers can now take advantage of both Redshift's automation of labor-intensive tasks such as setting up, operating and creating a data warehouse cluster and the power of Pentaho's big data analytics platform to cost-effectively improve business performance."

- Richard Daley | Pentaho Founder & Chief Strategy Officer

Therefore, without proper integration and analytics, no amount of data is useful. Pentaho data integration tools give businesses and agencies all the robust ETL abilities needed to organize any data for the next steps and business objectives. The Pentaho pipeline transfers this information on to its business analytics suite and allows businesses and agencies to gain the insights necessary to grow, target, and make vital decisions. Visual Connections has built services to help you open the full potential of the business analytics suite. We provide guidance on strategy and implementation so that you get the most out of Pentaho in all your data efforts.


Businesses and agencies who are looking to get more out of their data practice can rely on Visual Connections as their trusted Pentaho ETL advisor. Let’s discuss your data needs!


Frederick Deese

Fred Deese | Visual Connections President & CEO
After nearly a decade of providing advanced web and application development services to both the private and public sector, Fred Deese founded Visual Connections, an information technology consulting firm, in 2007. As Chief Executive Officer of the company, Fred is focused on new business development, overall company positioning, and client relationships.

Since the company's inception, he has held project management positions on many prominent and complex commercial and government IT projects, personally providing the services that have made Visual Connections widely recognized by its customers for its innovative technologies and quality work. See Full Bio Here