Donation

If you found the contents in this blog useful, then please make a donation to keep this blog running. You can make donations via Skrill with email address shazin.sadakath@gmail.com

Tuesday, November 13, 2012

Storing Apache Hadoop WordCount Example Output to Database

Apache Hadoop WordCount example is the HelloWorld of Hadoop. Using this to Database Sinking of Hadoop output makes it easy to understand. Database I used is MySQL and the DDL for table used is as following;

CREATE TABLE word_count(word VARCHAR(254), count INT);
After creating the following Apache Hadoop Job along with Mapper and Reducer to Sink the output to Database. For this I use DBOutputFormat as the OutputFormat and DBConfiguration to specify DB configuration parameters.
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;

public class WordCount {
    public static class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, DBOutput, IntWritable> {
        private static IntWritable one = new IntWritable(1);
        private static DBOutput text = new DBOutput();
        @Override
        public void map(LongWritable key, Text value,
                OutputCollector<DBOutput, IntWritable> collect, Reporter arg3)
                throws IOException {
            StringTokenizer token = new StringTokenizer(value.toString());
            while(token.hasMoreTokens()) {
                text.setText(token.nextToken());
                collect.collect(text, one);
            }            
                        
        }
        
    }
    
    public static class WordCountReducer extends MapReduceBase implements Reducer<DBOutput, IntWritable, DBOutput, IntWritable> {

        
        @Override
        public void reduce(DBOutput key, Iterator<IntWritable> values,
                OutputCollector<DBOutput, IntWritable> collect, Reporter arg3)
                throws IOException {
            int sum = 0;
            IntWritable no = null;
            DBOutput dbKey = new DBOutput();
            
            while(values.hasNext()) {
                no = values.next();
                sum += no.get();
            }
            dbKey.setText(key.getText());
            dbKey.setNo(sum);
            collect.collect(dbKey, new IntWritable(sum));
            
        }
        
    }
    
    public void run(String inputPath, String outputPath) throws Exception {
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");
        DistributedCache.addFileToClassPath(new Path("<Absolute Path>/mysql-connector-java-5.1.7-bin.jar"), conf);

        // the keys are DBOutput
        conf.setOutputKeyClass(DBOutput.class);
        // the values are counts (ints)
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(WordCountMapper.class);
        conf.setReducerClass(WordCountReducer.class);        
        
        conf.setOutputFormat(DBOutputFormat.class);

        FileInputFormat.addInputPath(conf, new Path(inputPath));
        DBOutputFormat.setOutput(conf, "word_count", "word", "count");
        
        DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://localhost:3306/sample", "root", "root");
        
        //FileOutputFormat.setOutputPath(conf, new Path(outputPath));

        JobClient.runJob(conf);
      }
    
    public static void main(String[] args) throws Exception {
        WordCount wordCount = new WordCount();
        wordCount.run(args[0], args[1]);
    }
    
    private static class DBOutput implements DBWritable, WritableComparable<DBOutput> {
        
        private String text;
        
        private int no;

        @Override
        public void readFields(ResultSet rs) throws SQLException {
            text = rs.getString("word");
            no = rs.getInt("count");
        }

        @Override
        public void write(PreparedStatement ps) throws SQLException {
            ps.setString(1, text);
            ps.setInt(2, no);
        }
        
        public void setText(String text) {
            this.text = text;
        }
        
        public String getText() {
            return text;
        }
        
        public void setNo(int no) {
            this.no = no;
        }
        
        public int getNo() {
            return no;
        }

        @Override
        public void readFields(DataInput input) throws IOException {
            text = input.readUTF();    
            no = input.readInt();
        }

        @Override
        public void write(DataOutput output) throws IOException {
            output.writeUTF(text);
            output.writeInt(no);
        }

        @Override
        public int compareTo(DBOutput o) {
            return text.compareTo(o.getText());
        }
        
    }
}
Furthermore I have written a custom Hadoop type for key which implements DBWritable and WritableComparable. I have used this as the Output Key Class. Command to run this is as following;
./bin/hadoop jar <Path to Jar>/HadoopTest.jar WordCount <Input Folder> <Dummy Output Folder>

46 comments:

lotus said...

Hi,

I just followed your blog and was able to put the data into the database as was supposed to do by the job but i now want to read the data and currently I am facing a problem with it. It would be of great help if you could post a job to retrieve the same data that you put in the DB.

Shazin Sadakath said...

Hi Lotus,

You can either use sqoop to retrieve data from your database as flat files and use them as input in your map reduce or you can write your custom DBInput.

ROHIT RANJAN said...

Thanks man. Saved the day !!!

ROHIT RANJAN said...

Hey Can u write a similar tutorial to take data from databse ... would be very helpful!!

Punit said...

I compiled this example and after map phase is 50% completed, I get an error which says that "DBOutput cannot be cast to DBWritable".
Please help

Hadoop online training said...

Hi,
good content to viewers hadoop experts provides best online training on
hadoop online training
by real time experienced experts

satish kumar said...

thank you for sharing Storing Apache Hadoop WordCount .Hadoop online Training in Hyderabad

Victoria John said...

Thanks for sharing this informative blog. If anyone wants to get Big Data Training Chennai visit fita academy located at Chennai, which offers best Hadoop Training Chennai with years of experienced professionals.

Martina Christy said...

Cloud Computing Training

I have read your blog and i got a very useful and knowledgeable information from your blog.its really a very nice article.You have done a great job . If anyone want to get real time Cloud Computing Course in Chennai, Please visit FITA academy located at Chennai Velachery which offer best Cloud Computing Training in Chennai.

dhanamlakshmi palu said...

I learn a worthful information by this training.This makes very helpful for future reference.All the doubts are very clearly explained in this article.Thank you very much.
AWS Training in chennai | AWS Training chennai | AWS course in chennai

Suranka VMware said...

In this we just learn about what is it,what is the need for using this application.So i need a lot of information about the application.
VMWare course chennai | VMWare certification in chennai | VMWare certification chennai

surangacloud said...

Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post.
Cloud Computing Training in chennai | Cloud Computing Training chennai | Cloud Computing Course in chennai | Cloud Computing Course chennai

Kalyan Hadoop said...

Best Big Data Hadoop Training in Hyderabad @ Kalyan Orienit

Follow the below links to know more knowledge on Hadoop

WebSites:
================
http://www.kalyanhadooptraining.com/

http://www.hyderabadhadooptraining.com/

http://www.bigdatatraininghyderabad.com/

Videos:
===============
https://www.youtube.com/watch?v=-_fTzrgzVQc

https://www.youtube.com/watch?v=Df2Odze87dE

https://www.youtube.com/watch?v=AOfX-tNkYyo

https://www.youtube.com/watch?v=Cyo3y0vlZ3c

https://www.youtube.com/watch?v=jOLSXx6koO4

https://www.youtube.com/watch?v=09mpbNBAmCo

Paul Miller said...

Thanks for sharing this here.

Hadoop course in t nagar
Hadoop training in adyar
Hadoop training institute in adyar

syed s said...

I was reading your blog this morning and noticed that you have a awesome
resource page. I actually have a similar blog that might be helpful or useful
to your audience.

Regards
sap sd and crm online training
sap online tutorials
sap sd tutorial
sap sd training in ameerpet

peterjohn said...

I really enjoy the blog.Much thanks again. Really Great.
Very informative article post.Really looking forward to read more. Will read on…

sap online training
sap sd online training
hadoop online training
sap-crm-online-training

Sai Santosh said...

This is really nice blog for all the technical issues especially relating to Hadoop. I came to know about this when I was attending hadoop training in hyderabad. Discussions will help me understand the concepts more than what I can understand on myself.

Steve Hawks said...

There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

Big Data Training | Big Data Course in Chennai

Ravindra Reddy said...

Very good articles for online trainings


Oracle DBA Online Training institute

Oracle SOA Online Training institute

SalesForce Online Training institute

SAP ABAP Online Training institute

SAP Basis Online Training institute

SAP Bw Hana Online Training institute

Pooja Doss said...

There are lots of information about latest technology and how to get trained in them, like Best Hadoop Training In Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies Hadoop Training in Chennai By the way you are running a great blog. Thanks for sharing this blogs..

Pooja Doss said...

I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing..
SalesForce Training in Chennai

Pooja Doss said...

Pretty article! I found some useful information in your blog, it was awesome to read,thanks for sharing this great content to my vision, keep sharing..
Unix Training In Chennai

Pooja Doss said...

This information is impressive..I am inspired with your post writing style & how continuously you describe this topic. After reading your post,thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic
Android Training In Chennai In Chennai

Pooja Doss said...

I have read your blog and i got a very useful and knowledgeable information from your blog.You have done a great job.
SAP Training in Chennai

Pooja Doss said...

Oracle Training in chennai
Thanks for sharing such a great information..Its really nice and informative..

Pooja Doss said...

Selenium Training in Chennai
Wonderful blog.. Thanks for sharing informative blog.. its very useful to me..

Pooja Doss said...

Data warehousing Training in Chennai
I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly..

Pooja Doss said...

Whatever we gathered information from the blogs, we should implement that in practically then only we can understand that exact thing clearly, but it’s no need to do it, because you have explained the concepts very well. It was crystal clear, keep sharing..
Websphere Training in Chennai

Pooja Doss said...

Oracle DBA Training in Chennai
Thanks for sharing this informative blog. I did Oracle DBA Certification in Greens Technology at Adyar. This is really useful for me to make a bright career..

James Brown said...

Your article has piqued a lot of interest. I can see why since you have done such a good job of making it interesting.
online word count tool

Sai Santosh said...

Along with hadoop online training, I frequently visit this blog for its summary about hadoop and other cloud based platforms. Thanks for giving us great insight about the subject.

Dinju Thomas said...

This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..
Selenium Training in Chennai | QTP Training in Chennai

Dinju Thomas said...

Thanks for Information Oracle Apps Technical is a collection of a bunch of collected applications like accounts payables, purchasing, inventory, accounts receivables, human resources, order management, general ledger and fixed assets, etc which have its own functionality for serving the business
Oracle Apps Training In Chennai

Dinju Thomas said...

Oracle Training in chennai | Oracle D2K Training In chennai
This information is impressive; I am inspired with your post writing style & how continuously you describe this topic. After reading your post, thanks for taking the time to discuss this, I feel happy about it and I love learning more about this topic..

Fita Ranjith said...

Thank you for this detailed article on web designing course in Chennai. It has been really helpful during my web designing training in Chennai, as I used the details in your web designing courses in Chennai as a reference to my students.

Andrew Son said...

Learning new technology would give oneself a true confidence in the current emerging Information Technology domain. With the knowledge of big data the most magnificent cloud computing technology one can go the peek of data processing. As there is a drastic improvement in this field everyone are showing much interest in pursuing this technology. Your content tells the same about evolving technology. Thanks for sharing this.

Hadoop Training in Chennai | Hadoop training institutes in chennai | Hadoop Training Chennai | Big Data Training in Chennai

Vinoth Kumar said...

Wiztech Automation Solutions is the Best Training institute in Chennai,started in the year 2006 and it extended its circle through providing the best Education as per the Global Quality Standards. Hence our Training Center in Chennai was Recognized by IAO and ISO for its inspiring Education Quality Standards. Wiztech Automation Solution, the PLC SCADA Training Academy in Chennai offers both PLC, SCADA, DCS, VFD, Drives, Control Panels, HMI, Pneumatics, Embedded systems, VLSI, IT, Web Designing, AutoCad Training courses in chennai with latest various brands. Wiztech Automation Solutions offers Real Time Training Courses with 100% Placement support in chennai.

PLC Training in chennai
SCADA Training in chennai
PLC Training Institute in chennai
Embedded System Training in chennai
VLSI Training in chennai
Automation Training in chennai
Industrial Automation Training in chennai
Process Automation Training in chennai
DCS Training in chennai
Inplant Training in chennai
Placement
PLC Course in chennai
Best PLC Training in chennai
PLC Training in chennai
Robotics Training in chennai
Embedded Training in chennai
IT Training in chennai
Web designing Training in chennai
AutoCad Training in chennai

Vinoth Kumar said...

Welcome to Wiztech Automation - Embedded System Training in Chennai. We have knowledgeable Team for Embedded Courses handling and we also are after Job Placements offer provide once your Successful Completion of Course. We are Providing on Microcontrollers such as 8051, PIC, AVR, ARM7, ARM9, ARM11 and RTOS. Free Accommodation, Individual Focus, Best Lab facilities, 100% Practical Training and Job opportunities.

Embedded System Training in chennai
Embedded System Training Institute in chennai
Embedded Training in chennai
Embedded Course in chennai
Embedded Systems Course in chennai
Best Embedded System Training Institute in chennai
Best Embedded System Training Institutes in chennai
Embedded Training Institute in chennai
Embedded System Course in chennai
Best Embedded System Training in chennai
VLSI Training in chennai

John Son said...

The Information which you provided is very much useful for Testing Training Learners Thank You for Sharing Valuable Information.
Software Testing Training in Chennai | Software Testing Training in Chennai

Anna said...

Great and Useful Article.

Online Java Training

Java Online Training India

Java Online Course

Java EE course

Java EE training

Best Recommended books for Spring framework

Java Interview Questions








Java Course in Chennai

Java Online Training India

Arjun kumar said...

I have finally found a Worth able content to read. The way you have presented information here is quite impressive. I have bookmarked this page for future use. Thanks for sharing content like this once again. Keep sharing content like this.

Software testing training in chennai | Testing courses in chennai | Manual testing training in Chennai

geethu said...

Great content. I really enjoyed while reading this content with useful information, keep sharing.
Hadoop Training in Chennai | Hadoop Training Chennai | FITA Velachery | FITA Academy Chennai.

geethu said...

Excellant content thanks for sharing the unique information and keep posting.
Android Training in Chennai | android courses in Chennai | FITA Velachery | FITA Training

kits online said...

very nice article.Thanks for sharing the post...!
Hadoop Online Training

Hyperion Online Training

Raghu Sharma said...

Cloud computing is preffered by most of the companies as this is cost effective and very fruitful, I'am using cloud for many days and I must say that it is very helpful.
cloud computing training in chennai|cloud computing courses in chennai|cloud computing training|cloud computing training chennai|cloud training in chennai

Dhivya Shree said...

Salesforce.com is an american company which offfers CRM based cloud services and it is loved globally for it quality services
salesforce training in chennai|salesforce training institute in chennai | salesforce course in chennai