午夜欧美,欧美VIDEO性欧美熟妇

初識MapReduce的應(yīng)用場景（附JAVA和Python代碼）

2019-03-01 08:34

Python進(jìn)階學(xué)習(xí)交流

關(guān)注

Java版本代碼

先是準(zhǔn)備一個數(shù)據(jù)集，包含著已經(jīng)切割好的詞匯，這里我們設(shè)置文件的格式是txt格式的。文件名是WordMRDemo．txt，內(nèi)容是下面簡短的一句話，以空格分割開：

hello my name is spacedong welcome to the spacedong thank you

引入Hadoop的依賴包

／／這里使用的是2．6．5的依賴包，你可以使用其他版本的
＜dependency＞
＜groupId＞org．a(chǎn)pache．hadoop＜／groupId＞
＜artifactId＞hadoop－common＜／artifactId＞
＜version＞2．6．5＜／version＞
＜／dependency＞
＜dependency＞
＜groupId＞org．a(chǎn)pache．hadoop＜／groupId＞
＜artifactId＞hadoop－client＜／artifactId＞
＜version＞2．6．5＜／version＞
＜／dependency＞

（溫馨提示：代碼部分可左右滑動）

新建WordMapper．java文件，代碼的作用是進(jìn)行以空格的形式進(jìn)行分詞。

public class WordMapper extends Mapper＜LongWritable， Text， Text， IntWritable＞｛
＠Override
protected void map（LongWritable key， Text value， Mapper．Context context）
throws java．io．IOException， InterruptedException ｛
String line ＝ value．toString（）；
／／StringTokenizer默認(rèn)按照空格來切
StringTokenizer st ＝ new StringTokenizer（line）；
while （st．hasMoreTokens（））｛
String world ＝ st．nextToken（）；
／／map輸出
context．write（new Text（world）， new IntWritable（1））；
｝
｝
｝

新建WordReduce．java文件，作用是進(jìn)行詞匯的統(tǒng)計(jì)。

public class WordReduce extends Reducer＜Text， IntWritable， Text， IntWritable＞｛
＠Override
protected void reduce（Text key， Iterable＜IntWritable＞ iterator， Context context）
throws java．io．IOException ，InterruptedException ｛
int sum ＝ 0 ；
for（IntWritable i：iterator）｛
sum＋＝i．get（）；
｝
context．write（key， new IntWritable（sum））；
｝
｝

新建WordMRDemo．java文件，作用是運(yùn)行Job，開始分析句子。

public class WordMRDemo ｛
public static void main（String［］ args）｛
Configuration conf ＝ new Configuration（）；
／／設(shè)置mapper的配置，既就是hadoop／conf／mapred－site．xml的配置信息
conf．set（＂mapred．job．tracker＂，＂hadoop：9000＂）；
try ｛
／／新建一個Job工作
Job job ＝ new Job（conf）；
／／設(shè)置運(yùn)行類
job．setJarByClass（WordMRDemo．class）；
／／設(shè)置要執(zhí)行的mapper類
job．setMapperClass（WordMapper．class）；
／／設(shè)置要執(zhí)行的reduce類
job．setReducerClass（WordReduce．class）；
／／設(shè)置輸出key的類型
job．setMapOutputKeyClass（Text．class）；
／／設(shè)置輸出value的類型
job．setMapOutputValueClass（IntWritable．class）；
／／設(shè)置ruduce任務(wù)的個數(shù)，默認(rèn)個數(shù)為一個（一般reduce的個數(shù)越多效率越高）
／／job．setNumReduceTasks（2）；
／／mapreduce 輸入數(shù)據(jù)的文件／目錄，注意，這里可以輸入的是目錄。
FileInputFormat．a(chǎn)ddInputPath（job， new Path（＂F：BigDataWorkPlacedatainput＂））；
／／mapreduce 執(zhí)行后輸出的數(shù)據(jù)目錄，不能預(yù)先存在，否則會報錯。
FileOutputFormat．setOutputPath（job， new Path（＂F：BigDataWorkPlacedataout＂））；
／／執(zhí)行完畢退出
System．exit（job．waitForCompletion（true）？ 0 ： 1）；
｝ catch （Exception e）｛
e．printStackTrace（）；
｝
｝
｝

最后執(zhí)行WordMRDemo．java文件，然后得到的結(jié)果是out文件夾內(nèi)的內(nèi)容，它長這個樣子：

out的文件目錄

打開part－r－00000文件的內(nèi)容如下

具體的文件內(nèi)容

具體的文件內(nèi)容Python代碼版本

新建map．py文件，進(jìn)行詞匯的切割。

for line in sys．stdin：
time．sleep（1000）
ss ＝ line．strip（）．split（＇＇）
for word in ss：
print ＇＇．join（［word．strip（），＇1＇］）

新建red．py文件，進(jìn)行詞匯的統(tǒng)計(jì)。

cur＿word ＝ None
sum ＝ 0
for line in sys．stdin：
ss ＝ line．strip（）．split（＇＇）
if len（ss）！＝ 2：
continue
word， cnt ＝ ss
if cur＿word ＝＝ None：
cur＿word ＝ word
if cur＿word �。� word：
print ＇＇．join（［cur＿word， str（sum）］）
cur＿word ＝ word
sum ＝ 0
sum ＋＝ int（cnt）
print ＇＇．join（［cur＿word， str（sum）］）

新建run．sh文件，直接運(yùn)行即可。

HADOOP＿CMD＝＂／usr／local／src／hadoop－2．6．5／bin／hadoop＂
STREAM＿JAR＿PATH＝＂／usr／local／src／hadoop－2．6．5／share／hadoop／tools／lib／hadoop－streaming－2．6．5．jar＂
INPUT＿FILE＿PATH＿1＝＂／test．txt＂
OUTPUT＿PATH＝＂／output＂
＄HADOOP＿CMD fs －rmr －skipTrash ＄OUTPUT＿PATH
＃ Step 1．
＄HADOOP＿CMD jar ＄STREAM＿JAR＿PATH
－input ＄INPUT＿FILE＿PATH＿1
－output ＄OUTPUT＿PATH
－mapper ＂python map．py＂
－reducer ＂python red．py＂
－file ．／map．py
－file ．／red．py

以上的是演示demo的核心代碼，完整的代碼可以上github的代碼倉庫上獲取。

GitHub地址為：http：／／github．com／cassieeric／bigDaaNotes

以上的文章是MapReduce系列的第一篇，下篇預(yù)告是MapReduce的編程模型，敬請期待！

福利

看完后，是否對 MapReduce 有了初步的了解呢？最后送一本電子書給大家《Hadoop的技術(shù)內(nèi)幕：深入解析MapReduce架構(gòu)設(shè)計(jì)及實(shí)現(xiàn)原理》，在公眾號后臺回復(fù) MapReduce 關(guān)鍵字即可獲取。

參考資料：

Hadoop的技術(shù)內(nèi)幕：深入解析MapReduce架構(gòu)設(shè)計(jì)及實(shí)現(xiàn)原理

題圖：cosmin Paduraru

<上一頁 1 2