This is less a question about Spark and more of a question of how Scala generates code. Remember that a Scala object
is pretty much a Java class full of static methods. Consider a simple example like this:
object foo {
val value = 42
def func(i: Int): Int = i + value
def main(args: Array[String]): Unit = {
println(Seq(1, 2, 3).map(func).sum)
}
}
That will be translated to 3 Java classes; one of them will be the closure that is a parameter to the map
method. Using javap
on that class yields something like this:
public final class foo$$anonfun$main$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
public static final long serialVersionUID;
public final int apply(int);
public int apply$mcII$sp(int);
public final java.lang.Object apply(java.lang.Object);
public foo$$anonfun$main$1();
}
Note there are no fields or anything. If you look at the disassembled bytecode, all it does is call the func()
method. When running in Spark, this is the instance that will get serialized; since it has no fields, there's not much to be serialized.
As for your question, how to initialize static objects, you can have an idempotent initialization function that you call at the start of your closures. The first one will trigger initialization, the subsequent calls will be no-ops. Cleanup, though, is a lot trickier, since I'm not familiar with an API that does something like "run this code on all executors".
One approach that can be useful if you need cleanup is explained in this blog, in the "setup() and cleanup()" section.
EDIT: just for clarification, here's the disassembly of the method that actually makes the call.
public int apply$mcII$sp(int);
Code:
0: getstatic #29; //Field foo$.MODULE$:Lfoo$;
3: iload_1
4: invokevirtual #32; //Method foo$.func:(I)I
7: ireturn
See how it just references the static field holding the singleton and calls the func()
method.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…