Sunday, 20 September 2015

Hadoop Interview Questions - Set 4

What is Hive ?

Hive is a data warehouse software which is used for facilitating querying and managing large data sets residing in distributed storage.Hive language almost look like SQL language called HiveQL.Hive also allows traditional map reduce programs to customize mappers and reducers when it is inconvenient or inefficient to execute the logic in HiveQL (User Defined Functions UDFS)


What is Hive Metastore ?

Hive Meta store is a database that stores metadata of your hive tables like table name,column name,data types,table location,number of buckets in the table etc.


What is the present version of Hive?

HIVE-0.13.1


What is the stable version of Hive ?

HIVE-0.12.0


Hive new version supported Hadoop Versions ?

This release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y, 2.x.y


Where we have to set the Hive Installation Path ?

we can set hive path in ~/.bashrc file or hadoop-env.sh file
~/.bashrc file or hadoop-env.sh file which one is better to set the path ?
~/.bashrc.sh


Why ~/.bashrc.sh is better then hadoop-env.sh ?

~/.bashrc.sh starts the work when system login but hadoop-env.sh starts the work only when hadoop starts in system


What is Hive Installation Path ?

export HIVE_HOME=/home/hadoop/work/hive-x.y.z
export PATH=$PATH:$HIVE_HOME/bin


How to Install Hive ?

check the answer above menu-hive installation tab


Which companies are mostly using Hive ?

Facebook,Netflix


Which company initially developed Hive ?

Facebook


How Facebook Uses Hadoop,Hive and Hbase ?

Facebook data stored on HDFS,everyday millions of photos uploaded into facebook with the help of Hadoop
Facebook Messages,Likes and statues updates running on top of HbaseHive to generate reports for third-party developers and advertisers who need to track the success of their applications or campaigns.What is Apache Hcatalog ?
HCatalog is built on top of the Hive metastore and incorporates Hive’s DDL.Apache Hcatalog is a table and data management layer for hadoop,we can process the data on Hcatalog by using APache pig,Apache Mapreduce and Apache Hive.There is no need to worry in Hcatalog where data is stored and which format of data generated.HCatalog displays data from RCFile format, text files, or sequence files in a tabular view. It also provides REST APIs so that external systems can access these tables’ metadata.


What is the work of Hive/Hcatalog ?

Hive/HCatalog also enables sharing of data structure with external systems including traditional data management tools.


What is WebHCatServer ?

The WebHcatServer provides a REST – like web API for Hcatalog.Applications make HTTP requests to run Pig, Hive, and HCatalog DDL from within applications.


What is SerDe in Apache Hive ?

SerDe full form is  Serializer Deserializer.Hive uses  Serializer Deserializer to read and write the data from hive table.The importent one behind hive is hive does not have own Hadoop distributed file system(HDFS) format that data is stored in.Users have to write store the hive data on HDFS by using (“CREATE EXTERNAL TABLE” or “LOAD DATA INPATH,” ) and use Hive to correctly “parse” that file format in a way that can be used by Hive.Hive uses to “parse” data stored in HDFS to be used by Hive


Is it possible to use same metastore by multiple users, in case of embedded hive?

No, it is not possible to use metastore in sharing mode. It is recommended to use standalone “real” database like MySQL or PostGresSQL.

Is multiline comment supported in Hive Script ?

NO

Difference between SQL and HiveQL ?





Hive Data types ?


No comments:

Post a Comment