How can I create an array of RDD in pyspark?

ساخت وبلاگ

Vote count: 0

I am trying to obtain a cartesian product of 1.5 million records(7 columns) with itself,I am currently using this configuration "./pyspark --master yarn-client --num-executors 50 --name "LND_TEST" --conf "spark.executor.cores=2" --conf "spark.executor.memory=14g" --conf "spark.driver.memory=14g" --conf "spark.shuffle.compress=true" --conf "spark.io.compression.codec=org.apache.spark.io.LZ4CompressionCodec""

However,i am not able to scale the application.I am trying to split the data into 15rdds and would want to perform this computation in incremental way.

asked 50 secs ago

back soft...
ما را در سایت back soft دنبال می کنید

برچسب : نویسنده : استخدام کار backsoft بازدید : 226 تاريخ : سه شنبه 5 ارديبهشت 1396 ساعت: 16:04