edited 2017/01/10 - (original answer keeped at bottom)
edited 2017/01/10 - (again) - some (not all) of my problems with timeouts were caused by a disk failure.
Problems with input data were handled by splitting the conversion operations. Now code has been changed to handle buffering in two different ways: for small files (by default configured for files up to 10MB
) a memory stream is used to store the output, but for big files (greater than 10MB
) a temporary file is used (see notes after code).
Option Explicit
Dim buffer
buffer = encodeFileBase64( "file.zip" )
WScript.StdOut.WriteLine( CStr(Len(buffer)) )
Private Function encodeFileBase64( file )
' Declare ADODB used constants
Const adTypeBinary = 1
Const adTypeText = 2
' Declare FSO constants
Const TEMP_FOLDER = 2
' Initialize output
encodeFileBase64 = ""
' Instantiate FileSystemObject
Dim fso
Set fso = WScript.CreateObject("Scripting.FileSystemObject")
' Check input file exists
If Not fso.FileExists( file ) Then
Exit Function
End If
' Determine how we will handle data buffering.
' Use a temporary file for large files
Dim useTemporaryFile
useTemporaryFile = fso.GetFile( file ).Size > 10 * 1048576
' Instantiate the B64 conversion component
Dim b64
Set b64 = WScript.CreateObject("Microsoft.XMLDOM").CreateElement("tmp")
b64.DataType = "bin.base64"
Dim outputBuffer, outputBufferName
If useTemporaryFile Then
' Create a temporary file to be used as a buffer
outputBufferName = fso.BuildPath( _
fso.GetSpecialFolder( TEMP_FOLDER ), _
fso.GetTempName() _
)
Set outputBuffer = fso.CreateTextFile( outputBufferName, True )
Else
' Instantiate a text stream to be used as a buffer to avoid string
' concatenation operations that were generating out of memory problems
Set outputBuffer = WScript.CreateObject("ADODB.Stream")
With outputBuffer
' Two bytes per character, BOM prefixed buffer
.Type = adTypeText
.Charset = "Unicode"
.Open
End With
End If
' Instantiate a binary stream object to read input file
With WScript.CreateObject("ADODB.Stream")
.Open
.Type = adTypeBinary
.LoadFromFile(file)
' Iterate over input file converting the file, converting each readed
' block to base64 and appending the converted text into the output buffer
Dim inputBuffer
Do
inputBuffer = .Read(3145716)
If IsNull( inputBuffer ) Then Exit Do
b64.NodeTypedValue = inputBuffer
If useTemporaryFile Then
Call outputBuffer.Write( b64.Text )
Else
Call outputBuffer.WriteText( b64.Text )
End If
Loop
' Input file has been readed, close its associated stream
Call .Close()
End With
' It is time to retrieve the contents of the text output buffer into a
' string.
If useTemporaryFile Then
' Close output file
Call outputBuffer.Close()
' Read all the data from the buffer file
encodeFileBase64 = fso.OpenTextFile( outputBufferName ).ReadAll()
' Remove temporary file
Call fso.DeleteFile( outputBufferName )
Else
' So, as we already have a Unicode string inside the stream, we will
' convert it into binary and directly retrieve the data with the .Read()
' method.
With outputBuffer
' Type conversion is only possible while at the start of the stream
.Position = 0
' Change stream type from text to binary
.Type = adTypeBinary
' Skip BOM
.Position = 2
' Retrieve buffered data
encodeFileBase64 = CStr(.Read())
' Ensure we clear the stream contents
.Position = 0
Call .SetEOS()
' All done, close the stream
Call .Close()
End With
End If
End Function
Will the memory be a problem?
Yes. Available memory is still a limit. Anyway I have tested the code with cscript.exe
running as a 32bit process with 90MB files and in 64bit mode with 500MB files without problems.
Why two methods?
The stream
method is faster (all operations are done in memory without string concatenations), but it requires more memory as it will have two copies of the same data at the end of the function: there will be one copy inside the stream and one in the string that will be returned
The temporary file method is slower as the buffered data will be written to disk, but as there is only one copy of the data, it requires less memory.
The 10MB
limit used to determine if we will use or not a temporary file is just a pesimistic configuration to prevent problems in 32bit mode. I have processed 90MB files in 32bit mode without problems, but just to be safe.
Why the stream
is configured as Unicode
and the data is retrieved via .Read()
method?
Because the stream.ReadText()
is slow. Internally it makes a lot of string conversions/checks (yes, it is advised in the documentation) that make it unusable in this case.
Below it is the original answer. It is simpler and avoids the memory problem in the conversion but, for large files, it is not enough.
Split the read/encode process
Option Explicit
Const TypeBinary = 1
Dim buffer
buffer = encodeFileBase64( "file.zip" )
WScript.StdOut.WriteLine( buffer )
Private Function encodeFileBase64( file )
Dim b64
Set b64 = WScript.CreateObject("Microsoft.XMLDOM").CreateElement("tmp")
b64.DataType = "bin.base64"
Dim outputBuffer
Set outputBuffer = WScript.CreateObject("Scripting.Dictionary")
With WScript.CreateObject("ADODB.Stream")
.Open
.Type = TypeBinary
.LoadFromFile(file)
Dim inputBuffer
Do
inputBuffer = .Read(3145716)
If IsNull( inputBuffer ) Then Exit Do
b64.NodeTypedValue = inputBuffer
outputBuffer.Add outputBuffer.Count + 1, b64.Text
Loop
.Close
End With
encodeFileBase64 = Join(outputBuffer.Items(), vbCrLf)
End Function
Notes:
No, it is not bulletproof. You are still limited by the space needed to construct the output string. For big files, you will need to use an output file, writing partial results until all the input has been processed.
3145716 is just the nearest multiple of 54 (the number of input bytes for each base64 output line) lower than 3145728 (3MB).