Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

sql - Is there a way to load text data to database in PostgreSQL?

I want to extract information from a text file (almost 1GB) and store it in PostgreSQL database. Text file is in following format:

DEBUG, 2017-03-23T10:02:27+00:00, ghtorrent-40 -- ghtorrent.rb:Repo EFForg/https-everywhere exists
DEBUG, 2017-03-24T12:06:23+00:00, ghtorrent-49 -- ghtorrent.rb:Repo Shikanime/print exists
...

and I want to extract 'DEBUG', timestamp, 'ghtorrent-40', 'ghtorrent' and "Repo EFForg/https-everywhere exists" from each line and store it in database.

I have done it in using other languages like python (psycopg2) and C++ (libpqxx) but is it possible to write a function in PostgreSQL itself to import the whole data itself.

I am currenly using pgAdmin4 tool for the PostgreSQL. I thinking of using something like pg_read_file in function to read the file but one line at a time and insert it into the table.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

An approach I use with my large XML files - 130GB or bigger - is to upload the whole file into a temporary unlogged table and from there I extract the content I want. Unlogged tables are not crash-safe, but are much faster than logged ones, which totally suits the purpose of a temporary table ;-)

Considering the following table ..

CREATE UNLOGGED TABLE tmp (raw TEXT);

.. you can import this 1GB file using a single psql line from your console (unix)..

$ cat 1gb_file.txt | psql -d db -c "COPY tmp FROM STDIN" 

After that all you need is to apply your logic to query and extract the information you want. Depending on the size of your table, you can create a second table from a SELECT, e.g.:

CREATE TABLE t AS
SELECT 
  trim((string_to_array(raw,','))[1]) AS operation,
  trim((string_to_array(raw,','))[2])::timestamp AS tmst,
  trim((string_to_array(raw,','))[3]) AS txt
FROM tmp
WHERE raw LIKE '%DEBUG%' AND
      raw LIKE '%ghtorrent-40%' AND 
      raw LIKE '%Repo EFForg/https-everywhere exists%'

Adjust the string_to_array function and the WHERE clause to your logic! Optionally you can replace these multiple LIKE operations to a single SIMILAR TO.

.. and your data would be ready to be played with:

SELECT * FROM t;

 operation |        tmst         |                               txt                                
-----------+---------------------+------------------------------------------------------------------
 DEBUG    | 2017-03-23 10:02:27 | ghtorrent-40 -- ghtorrent.rb:Repo EFForg/https-everywhere exists
(1 Zeile)

Once your data is extracted you can DROP TABLE tmp; to free some disk space ;)

Further reading: COPY, PostgreSQL array functions and pattern matching


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...