Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
436 views
in Technique[技术] by (71.8m points)

mapreduce - Implementation of an ArrayWritable for a custom Hadoop type

How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data

I have an Individual Posting class which stores the term frequency, document id and list of byte offsets for the term in the document.

I have a Posting class which has a document frequency (number of documents the term appears in) and list of Individual Postings

I have defined a LongArrayWritable extending the ArrayWritable class for the list of byte offsets in IndividualPostings

When i defined a custom ArrayWritable for IndividualPosting I encountered some problems after local deployment (using Karmasphere, Eclipse).

All the IndividualPosting instances in the list in Posting class would be the same, even though I get different values in the Reduce method

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From the documentation of ArrayWritable:

A Writable for arrays containing instances of a class. The elements of this writable must all be instances of the same class. If this writable will be the input for a Reducer, you will need to create a subclass that sets the value to be of the proper type. For example: public class IntArrayWritable extends ArrayWritable { public IntArrayWritable() { super(IntWritable.class); } }

You've already cited doing this with a WritableComparable type defined by Hadoop. Here's what I assume your implementation looks like for LongWritable:

public static class LongArrayWritable extends ArrayWritable
{
    public LongArrayWritable() {
        super(LongWritable.class);
    }
    public LongArrayWritable(LongWritable[] values) {
        super(LongWritable.class, values);
    }
}

You should be able to do this with any type that implements WritableComparable, as given by the documentation. Using their example:

public class MyWritableComparable implements
        WritableComparable<MyWritableComparable> {

    // Some data
    private int counter;
    private long timestamp;

    public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
    }

    public int compareTo(MyWritableComparable other) {
        int thisValue = this.counter;
        int thatValue = other.counter;
        return (thisValue < thatValue ? -1 : (thisValue == thatValue ? 0 : 1));
    }
}

And that should be that. This assumes you're using revision 0.20.2 or 0.21.0 of the Hadoop API.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...