Search This Blog

Saturday, February 20, 2016

Hive 2.0 includes HPL/SQL

HPL/SQL (formerly PL/HQL) is a language translation and execution layer developed by Dmitry Tolpeko.  It was introduced into the Hive source code in June, 2015 (JIRA-11055) and included this February, 2016 in Hive 2.0.  However, it doesn't need Hive to function.

http://www.hplsql.org/
Let me introduce PL/HQL, an open source tool that implements procedural SQL
can be used with any SQL-on-Hadoop solution.
Motivation:
- Writing the driver code using well-known procedural SQL (not bash)
that enables Hadoop to even more wider audience
- Allowing dynamic SQL, iterations, flow-of-control and SQL exception
handling
- Facilitating migration of RDBMS workload to Hadoop
Plans (besides extending syntax):
- Supporting CREATE PROCEDURE/FUNCTION/PACKAGE to reuse code
- Allowing connections to multiple databases (i.e. lookup tables in
relational databases)
- On-the-fly SQL conversion (SELECT i.e.), compatibility layer

Current steps to install in a Hortonworks HDP 2.3.2 environment.  Substitute for the version you are using.

Download and Install
tar xvf hplsql-0.3.13.tar.gzz /usr/hdp/2.3.2.0-2950/
ln -s /usr/hdp/2.3.2.0-2950/hplsql-0.3.13/ /usr/hdp/current/hplsql

Configure HADOOP_CLASSPATH
Edit /usr/hdp/current/hplsql/hplsql
Replace /usr/lib/ with /usr/hdp/2.3.2.0-2950/

Add to Path (in this case globally)
echo "PATH=${PATH}:/usr/hdp/current/hplsql" > /etc/profile.d/hplsql-path.sh && chmod 755 /etc/profile.d/hplsql-path.sh

Configure plhql-site.xml
To configure Hive connection settings, and connectivity to other databases (mySQL, Teradata, IBM DB/2, Oracle, MSSQL)

Test from Command Line
hplsql --version

Get the previous date:
START=$(hplsql -e 'CURRENT_DATE - 1')
Copy table to a file:
COPY (SELECT id, name FROM sales.users WHERE local_dt = CURRENT_DATE) 
  TO /data/users.txt DELIMITER '\t';
Copy table from default connection (Hive) to Teradata connection
COPY sales.users TO sales.users2 AT tdconn;
Log to mySQL from Hive
MAP OBJECT log TO log.log_data AT mysqlconn;
 
DECLARE cnt INT;
SELECT count(*) INTO cnt FROM sales.users WHERE local_dt = CURRENT_DATE;
 
INSERT INTO log (message) VALUES ('Number of users: ' || cnt);  
Compare Hive table totals to mySQL
CMP SUM sales.users WHERE local_dt = CURRENT_DATE, users_daily AT mysqlconn;  
Great addition to the Hive codebase.