Spark In-Memory Persistence and Memory Management

Spark In-Memory Persistence and Memory Management must be understood by engineering teams. Spark’s performance advantage over MapReduce is greatest in use cases involving repeated computations. Much of this performance increase is due to Spark’s use of in-memory persistence. Rather than writing to disk between each pass through the data, Spark has the option of keeping the data on the executors loaded into memory. That way, the data on each partition is available in-memory each time it needs to be accessed.

Spark offers three options for memory management: in-memory as deserialized data, in-memory as serialized data, and on disk. Each has different space and time advantages:

  1. In memory as deserialized Java objects
  2. As serialized data
  3. On disk

In memory as deserialized Java objects

The most intuitive way to store objects in RDDs is as the original deserialized Java objects that are defined by the driver program. This form of in-memory storage is the fastest since it reduces serialization time; however, it may not be the most memory efficient, since it requires the data to be stored as objects.

As serialized data

Using the standard Java serialization library, Spark objects are converted into streams of bytes as they are moved around the network. This approach may be slower since serialized data is more CPU-intensive to read than deserialized data; however, it is often more memory efficient, since it allows the user to choose a more efficient representation. While Java serialization is more efficient than full objects, Kryo serialization can be even more space efficient.

On disk

RDDs, whose partitions are too large to be stored in RAM on each of the executors, can be written to disk. This strategy is obviously slower for repeated computations but can be more fault-tolerant for long sequences of transformations, and maybe the only feasible option for enormous computations.

The persist() function in the RDD class lets the user control how the RDD is stored. By default, persist() stores an RDD as deserialized objects in memory, but the user can pass one of nthe umerous storage options to the persist() function to control how the RDD is stored. We will cover the different options for RDD reuse in “Types of Reuse: Cache, Persist, Checkpoint, Shuffle Files” . When persisting RDDs, the default implementation of RDDs evicts the least recently used partition (called LRU caching) if the space it takes is required to compute or to cache a
new partition. However, you can change this behavior and control Spark’s memory prioritization with the persistencePriority() function in the RDD class.

89 thoughts to “Spark In-Memory Persistence and Memory Management”

  1. Pingback: attacco di panico
  2. Pingback: sexy Latina duo
  3. Pingback: sweet Sofia cam
  4. Pingback: widi media share
  5. Pingback: ดูบอล
  6. Pingback: gucci bags
  7. Pingback: Hot webcam
  8. Pingback: Silicide powder
  9. Pingback: Comedy rap
  10. Pingback: hot hunk sexy
  11. Pingback: Rat PK Studies
  12. Pingback:
  13. Pingback: crystal dong
  14. Pingback: rabbit vibrator
  15. Pingback: wf36j4000aw
  16. Pingback: icicles dildo
  17. Pingback: flexible dildo
  18. Pingback: buy
  19. Pingback: فیلم سوپر
  20. Pingback: کلیپ سکسی
  21. Pingback: Guitar
  22. Pingback: 안전놀이터
  23. Pingback: L'atomo
  24. Pingback: mp3 songs
  25. Pingback: Geschenkidee
  26. Pingback: Blonde chat
  27. Pingback: Cams
  28. Pingback: desi lady
  29. Pingback: nipple pump
  30. Pingback: 무료스핀
  31. Pingback: prostate massage
  32. Pingback: curved vibrator
  33. Pingback: 토토사이트
  34. Pingback: 온라인블랙잭
  35. Pingback: steel drum soloist
  36. Pingback: 먹튀검증
  37. Pingback: 안전공원
  38. Pingback: 토토사이트
  39. Pingback: #Section808
  40. Pingback: #Section808
  41. Pingback: #cherry
  42. Pingback: Alufenster
  43. Pingback: rtwlanu_xp.sys
  44. Pingback: 네임드사다리
  45. Pingback:
  46. Pingback: 안전공원
  47. Pingback: 먹튀검증
  48. Pingback: male sex toys
  49. Pingback: care giver
  50. Pingback: penis extender
  51. Pingback: buy nipple clamps
  52. Pingback: vps server
  53. Pingback: suction cup dildo
  54. Pingback: canada pharmacy
  55. Pingback: p spot massager
  56. Pingback: anal toys
  57. Pingback: icicles glass toys
  58. Pingback: Viagra 5 mg
  59. Pingback: Viagra generique
  60. Pingback: LolyCam 18+
  61. Pingback: mature porn

Comments are closed.