Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share

Login

Remember

Register

Ask
Q&A
All Activity
Hot!
Unanswered
Tags
Users
Ask a Question

Welcome To Ask or Share your Answers For Others

Categories

Topic[话题] (13)

Life[生活] (4)

Technique[技术] (2.1m)

Idea[创意] (3)

Jobs[工作] (2)

Others[杂七杂八] (18)

Code Example[编程示例] (0)

Skewed dataset join in Spark?

0 votes

286 views

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

Skewed dataset join in Spark?

I am joining two big datasets using Spark RDD. One dataset is very much skewed so few of the executor tasks taking a long time to finish the job. How can I solve this scenario?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Skewed

Please log in or register to add a comment.

Welcome To Ask or Share your Answers For Others

Please log in or register to answer this question.

1 Answer

0 votes

answered Oct 17, 2021 by 深蓝 (71.8m points)

Pretty good article on how it can be done: https://datarus.wordpress.com/2015/05/04/fighting-the-skew-in-spark/

Short version:

Add random element to large RDD and create new join key with it
Add random element to small RDD using explode/flatMap to increase number of entries and create new join key
Join RDDs on new join key which will now be distributed better due to random seeding

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Please log in or register to add a comment.

Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share

Click Here to Ask a Question

Just Browsing Browsing

[1] jquery - Bootstrap 3 DropdownButton which activate file select box

[2] Open JavaScript Onedrive file picker with access token fetched from other application

[3] ajax上传文件有时候成功，有时候失败。跟网络速度有关？有好的解决方案吗？

[4] Chrome开发者工具怎么提取动态加载的所有代码？

[5] 将数组转换成带排序的二维数组?

[6] javascript - Unable to access $refs in component method

[7] C++ 语法的奇怪地方

[8] gpu - In WebGL or OpenGL is it bad to use an output fragment variable as temp storage?

[9] Postgresql drop schema是直接物理删除所有表么?

[10] hive查询报错

2.1m questions

2.1m answers

60 comments

57.0k users

Most popular tags

javascript python c# java How android c++ php ios html sql r c node.js .net iphone asp.net css reactjs jquery ruby What Android objective mysql linux Is git Python windows Why regex angular swift amazon excel algorithm macos Java visual how bash Can multithreading PHP Using scala angularjs typescript apache spring performance postgresql database flutter json rust arrays C# dart vba django wpf xml vue.js In go Get google jQuery xcode jsf http Google mongodb string shell oop powershell SQL C++ security assembly docker Javascript Android: Does haskell Convert azure debugging delphi vb.net Spring datetime pandas oracle math Django

Xstack问答社区
生活宝问答社区
OverStack问答社区
Ostack问答社区
在这了问答社区
在哪了问答社区
Xstack问答社区
无极谷问答社区
TouSu问答社区
SQlite问答社区
Qi-U问答社区
MLink问答社区
Jonic问答社区
Jike问答社区
16892问答社区
Vigges问答社区
55276问答社区
OGeek问答社区
深圳家问答社区
深圳家问答社区
深圳家问答社区
Vigges问答社区
Vigges问答社区
在这了问答社区
DevDocs API Documentations

Xstack问答社区
生活宝问答社区
OverStack问答社区
Ostack问答社区
在这了问答社区
在哪了问答社区
Xstack问答社区
无极谷问答社区
TouSu问答社区
SQlite问答社区
Qi-U问答社区
MLink问答社区
Jonic问答社区
Jike问答社区
16892问答社区
Vigges问答社区
55276问答社区
OGeek问答社区
深圳家问答社区
深圳家问答社区
深圳家问答社区
Vigges问答社区
Vigges问答社区
在这了问答社区
在这了问答社区
DevDocs API Documentations

Xstack问答社区
生活宝问答社区
OverStack问答社区
Ostack问答社区
在这了问答社区
在哪了问答社区
Xstack问答社区
无极谷问答社区
TouSu问答社区
SQlite问答社区
Qi-U问答社区
MLink问答社区
Jonic问答社区
Jike问答社区
16892问答社区
Vigges问答社区
55276问答社区
OGeek问答社区
深圳家问答社区
深圳家问答社区
深圳家问答社区
Vigges问答社区
Vigges问答社区
在这了问答社区
DevDocs API Documentations

Send feedback
深圳家
深圳家
极客中国
搜外友链
Ostack Developer QA ZONE
CC BY-SA 3.0
Contact with WebMaster by Email: [email protected]

Powered by Question2Answer

Theme by Q2A Market&&OStack.cn

...