Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
461 views
in Technique[技术] by (71.8m points)

javascript - Counting the number of lines in Google Document

Problem:

I'd like to be able to count the number of lines in a Google Document. For example, the script must return 6 for the following text.

enter image description here

There doesn't seem to be any reliable method of extracting ' ' or ' ' characters from the text though.

text.findText(/
/g)  //OR
text.findText(/
/g)

The 2nd line of code is not supposed to work anyway, because according to GAS documentation, 'new line characters are automatically converted to /r'

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you are still looking for the solution, how about this answer? Unfortunately, I couldn't find the prepared methods for retrieving the number of lines in the Google Document. In order to do this, how about this workaround?

If the end of each line can be detected, the number of lines can be retrieved. So I tried to add the end markers of each line using OCR. I think that there might be several workarounds to solve your issue. So please think of this as one of them.

At Google Documents, when a sentence is over the page width, the sentence automatically has the line break. But the line break has no or . When users give the line break by the enter key, the line break has or . By this, the text data retrieved from the document has only the line breaks which were given by users. In your case, it seems that your document has the line breaks for after incididunt and consequat.. So the number of lines doesn't become 6.

I thought that OCR may be able to be used for this situation. The flow is as follows.

  1. Convert Google Document to PDF.
  2. Convert PDF to text data using OCR.
    • I selected "ocr.space" for OCR.
      • If you have already known APIs of OCR, you can try to do this.
    • When I used OCR of Drive API, the line breaks of or were not added to the converted text data. So I used ocr.space. ocr.space can add the line breaks.
  3. Count in the converted text data.
    • This number means the number of lines.

The sample script for above flow is as follows. When you use this, please retrieve your apikey at "ocr.space". When you input your information and email to the form, you will receive an email including API key. Please use it to this sample script. And please read the quota of API. I tested this using Free plan.

Sample script :

var apikey = "### Your API key for using ocr.space ###";

var id = DocumentApp.getActiveDocument().getId();
var url = "https://docs.google.com/feeds/download/documents/export/Export?id=" + id + "&format=pdf&access_token=" + ScriptApp.getOAuthToken();
var blob = UrlFetchApp.fetch(url).getBlob();
var payload = {method: "POST", headers: {apikey: apikey}, payload: {file: blob}};
var ocrRes = JSON.parse(UrlFetchApp.fetch("https://api.ocr.space/Parse/Image", payload));
var result = ocrRes.ParsedResults.map(function(e){return e.ParsedText.match(/
/g).length})[0];
Logger.log(result)

Result :

When your sentences are used, 6 is obtained as the result of script.

Note :

  • Even if the last line of the document has no or , the converted text data has at the end of all lines.
  • In this case, the precision of OCR is not important. The important point is to retrieve the line breaks.

I tested this script for several documents. In my environment, the correct number of line can be retrieved. But I'm not sure whether this script works for your environment. If this script cannot be used for your environment, I'm sorry.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...