Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
369 views
in Technique[技术] by (71.8m points)

java - Sending non-ASCII text in HTTP POST header

I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:

String filename = "?úü???.doc"
URL url = new URL("http://www.myurl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.addRequestProperty("Accept", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Type", "application/octet-stream; charset=UTF-8");
conn.addRequestProperty("Filename", filename);
// do more stuff here

The problem is, some of the files I need to send have filenames that contain non-ASCII characters. I have read that you cannot send non-ASCII text in an HTTP header.

My questions are:

  1. Can you send non-ASCII text in an HTTP header?
  2. If you can, how do you do this? The code above does not work when filename contains non-ASCII text. The server responds with "Bad Request 400".
  3. If you cannot, what is the typical way to get around this limitation?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :

The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.

In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).

In Java you can do this using the java.net.URLEncoder class.

2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2

 header-field   = field-name ":" OWS field-value OWS

 field-name     = token
 field-value    = *( field-content / obs-fold )
 field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
 field-vchar    = VCHAR / obs-text

 obs-fold       = CRLF 1*( SP / HTAB )
                ; obsolete line folding
                ; see Section 3.2.4

Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being

[USASCII]     American National Standards Institute, "Coded Character
              Set -- 7-bit American Standard Code for Information
              Interchange", ANSI X3.4, 1986.

The standards are still very clear, HTTP header are still US-ASCII ONLY


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...