CSV file upload with ZIP containers, jQuery, Ajax and PHP 5.4 progress tracking – I

This article series is written in support for French colleagues in a PHP collaboration project and therefore in English. I want to describe some basic elements of an

Ajax controlled file upload process between

  • a browser based User Interface (HTML4/5, Javascript, jQuery)
  • and some PHP/MySQL application programs on a LAMP server.

Our customer’s project depends on a periodic transfer of up to 40 different CSV files with a lot of input data (around 0.5 GByte) to a database server. A requirement of our customer was that the data transfer should be performed with a ZIP file as a container for the individual CSV data files. After the transfer the contents of the individual CSV files should be imported into specific tables of a database.

As our whole interface of the web application is Ajax based, we decided to control all transfers via jQuery’s Ajax API. Meanwhile, there are jQuery Plugins available for this type of task. However, we wanted to fully control all important phases of the file transfer and the data import – both on the client (browser) as well as on the server. This meant that we needed to program all basic steps during an Ajax communication cycle between the browser client and the LAMP server by ourselves. In addition we needed to guarantee some error control.

Personally, I found it a bit astonishing that such a seemingly simple task lead me to some relatively intricate obstacles to overcome. Although most of the necessary ingredients are documented on the Internet, the documentation is sparse and distributed. My objective with this article series is to provide a coherent picture of process design aspects, some coding tricks and also limitations of such a process. This may be useful also for other developers having to solve similar problems. However, if you want to read about a most simple and problem free approach to file upload tasks with Ajax you are probably looking at the wrong article.

Objectives and Assumptions

  • We want to upload several CSV-files (up to 40), whose contents shall be transferred to specific database tables.
  • These files shall be sent to the server in one ZIP file. Reasons for using a ZIP-container file: compression; limitations of the HTML4 file upload API.
  • As the Zip-container may get relatively large we want to see and control the transfer progress over the Internet by some means of PHP 5.4 – as far as possible today.
  • The server shall extract the files from the ZIP and build up a “pipeline” of these files for a subsequent database import of their contents.
  • The data import into database tables shall be done by a sequence of Ajax controlled PHP jobs. Reason: Intermediate information transfer to the client with the option to stop further processing.
  • The server shall decide by some naming conventions what to do with each file.
  • All steps shall be Ajax controlled – a relatively continuous flow of information between client and server has to be established.
  • For the sake of simplicity each Ajax answer of the server at the end of each controlled Ajax transaction cycle shall be encoded in form of a JSON object. (So, if you want to be particularly precise: we use Ajaj instead of Ajax.)

Wording used in this article series

  • JS, jQuery, Ajax:JS below stands for Javascript (on the
    client side). We furthermore use jQuery and its Ajax interface functionality. We expect JSON responses from the server. Although not completely correct we nevertheless use Ajax and Ajaj as synonyms in the articles of this series.
  • Upload: By “upload” we normally mean the whole process. It comprises a “file transfer process” from the Web client PC to the server and subsequent “database import processes”. However, sometimes and for reasons of simplicity we also use the expression “file upload” in a restricted sense – namely for the file transfer to the server, only. It should become clear from the context what we mean.
  • Main PHP program/job:
    The PHP program receiving and working with the transferred Zip-container file and its contents is called “main PHP program” or “main PHP job”. It has to be distinguished from “polling jobs” (see below).
  • Polling jobs: A sequence of additional PHP “polling jobs” may be triggered by the client. This is done in form of a time loop with a short period. A “polling job” on the LAMP server reads some status information of a previously started and still running “Main PHP job” (as e.g. the file transfer job or long lasting database import jobs). The status information of the running main job is fetched from a common data source as the $_SESSION or a database table accessible to both the status writing “main PHP job” and the status reading “polling job”. Each short timed “polling job” fulfills its own complete Ajax transaction cycle. The evaluation of the Ajax response triggers the next polling job if the main PHP job is still running. We come back to the concept of “Ajax driven polling jobs” later on.
  • CtrlOs: Each user interaction area of a web page – e.g. a HTML FORM in a DIV container – shall be completely controlled by a so called JS “Control object” [CtrlO]. A CtrlO encapsulates all reactions of the UI to events in well defined prototype methods. A CtrlO uses jQuery’s proxy mechanism to register events and delegate event handling to defined CtrlO methods. CtrlO methods furthermore control the Ajax communication with the server.
  • Phases: “Phases” describe a full cycle of defined Ajax interaction between client and server. An example of such a full cycle would be:

    HTML Form => Ajax Setup via JS CtrlO method => Submit via JS CtrlO method => POST/FILE data transfer => Server action (PHP) => JSON object as Ajax response => Client analysis of the JSON object via CtrlO method for the Ajax response

  • Client: The client is in our case typically a browser (Firefox) with active JS and jQuery. We do not care about specific requirements of MS IE browsers in this article series; but we assume that at least MS IE browsers > 10 should work.
  • Pipeline: The ordered sequence by which the files of the transferred ZIP container are imported into their related database.

Relevant phases

To get a more detailed overview over what is to be done we distinguish the following main phases and steps (I omit error handling in this overview which may occur at every step):

Phase I – file transfer, progress control and Zip extraction

Step I.1 – Client: Use a HTML form to choose a ZIP file (<input type=”file”>) and use methods of a specifically designed JS
Control object [CtrlO] to control subsequent actions on the client. Add parameter data (hidden input fields) and prepare an Ajax transaction for the file upload (=transfer) process.

Step I.2 – Client: Start the transfer the ZIP file over the Internet to the server. Submit a special parameter in addition to the file to trigger the provision of transfer progress information on the server. Prepare and start the Ajax communication and the data transfer by a CtrlO method.

Step I.3 – Server: Initialize the progress measurement and provide progress data in the $_SESSION array.

Step I.4 – Client/Server: Initiate a sequence of Ajax polling jobs via a JS time loop for reading the progress information on the server. Handle the Ajax response of each polling job in separate defined methods of a special CtrlO. React to error situations and stop the polling job time loop in case of errors or when the file transfer has finalized.

Step I.5 – Server: Extract, expand and save the CSV files from the Zip-container into a special upload folder on the server. This is done by using standard methods of the PHP ZIP class. Define/Suggest a sequence of imports of the data contents of the different files into file specific database tables. This defined sequence may be controlled via an array (“DB Import Control array” = DBIC-array ) which is kept and updated in the $_SESSION array on the server AND which is also sent back via a JSON object to the client.

Step I.6 – Server: Prepare and send an Ajax response in form of a JSON object to the client with affirmation messages about which CSV files have been received, the name of the files and the order in which they shall be processed. Include error messages and system messages if necessary. The JSON object shall contain the “DBIC”-array.

Step I.7 – Client: Analyze the Ajax response. Display success and error information. Display the number and name of files to be processed afterwards. Stop the time loop for polling jobs.

Phase II – Database import of a file in the pipeline

Step II.1 – Client: Prepare and start a new Ajax job with some parameters. The PHP target program of this job shall import the data of one of the already transferred CSV files. Among other things it should be defined, which file shall be processed (= imported into its associated database table) next. This parameter can follow the suggested order of the array which came from the server at the end of Phase I. All parameters can be set up in a separate (hidden) form with hidden input fields. Submit the Ajax job.

Step II.2 – Server: Start the database import on the server with a flexible PHP program. For small and medium sized files (up to approx. below 500000 lines) do it line by line by appropriate special PHP standard methods for handling CSV files. Check the data of each line where reasonable. Gather at least 20 lines in one INSERT statement to accelerate the import process. Write intermediate progress information into a $_SESSION array or a special database table. (This status information may be read by “polling jobs” started on the client.)
For huge files you may extend the import methods later on by using the special MySQL
command “LOAD DATA INFILE”.

Step II.3 – Client: Launch a sequence of status information polling jobs via a time loop. Handle the return information of each job in separate methods.

Step II.3 – Server: After a successful import of a defined file remove the file from its upload directory (delete it or move it somewhere else, e.g. in a history directory for uploads). Update you Control Array for the sequence of uploads with the following info: Which of the original files have been loaded? Which had errors? Which are still unprocessed? Prepare a JSON object for an Ajax respond (including the upload Control Array). Send it back to the client.

Step II.4 – Client: Analyze the server’s response. Stop any polling jobs issued after the previous submit. Continue with displaying information of the success of the database import of the handled file. Determine the next file to load. Continue with the elements of Step 5 described above.

Phases III + n – Client/Server – Loop:

Cycle through a sequence of Steps described under II.1 to II.4 for further phases until all files are processed or an error has occurred.

Enough for today. In the next article

CSV file upload with Zip containers, jQuery, Ajax and PHP 5.4 progress tracking – II

we shall cover some major elements of Phase I.