node+js for large file fragment upload

1. What is fragment upload

Fragment upload is the transmission of a large file into several blocks, one by one. The benefit of doing so can reduce the overhead of re uploading. For example: if the file we upload is a large file, the upload time should be long. In addition to the influence of various factors of network instability, it is easy to cause transmission interruption. Users have no other way except to upload the file again, but we can use the split upload to solve this problem. Through the technology of fragment upload, if the network transmission is interrupted, we only need to transfer the remaining fragments to reselect the file. The whole file does not need to be retransmitted, which greatly reduces the cost of retransmission.

But how do we choose a suitable segment? Therefore, we should consider the following:

1. The smaller the fragmentation is, the more requests are sure to be, the greater the overhead. So it cannot be set too small.
2. The larger the slice, the less flexibility.
3. There will be a fixed size receive Buffer on the server side. The size of the slice should be an integral multiple of this value.

Therefore, considering that the recommended partition size is 2M-5M, the specific partition size needs to be determined according to the file size. If the file is too large, the recommended partition size is 5M. If the file is relatively small, the recommended partition size is 2M.

The steps of file fragment upload are as follows:

1. First, md5 encrypt the file. The advantage of using md5 encryption is that the file can be uniquely identified and can also be compared for background file integrity verification.
2. After getting the md5 value, the server queries whether the file has been uploaded. If it has been uploaded, it does not need to upload again.
3. Slice large documents. For example, for a 100M file, if one of our segments is 5M, the file can be uploaded 20 times.
4. Request the interface to the background. The data in the interface is the file block we have uploaded. (Note: Why did you send this request? For example, we use Baidu online disk right bar, which has the function of continuous transmission. When a file reaches half of the time, I suddenly want to leave work and don't want to upload it. Then the server should remember the file block I uploaded before. When I open the computer and upload again, it should skip the file block I uploaded before. Then upload the subsequent blocks).
5. Start to upload the file block that has not been uploaded. (this is the second request, which will merge all segments and upload the request.).
6. After the upload is successful, the server will merge the files. Finally.




2. Understand slice method in Blob object to segment files and other knowledge points

Take a look at my previous blog: Using blob object to upload large files in pieces

Blob object has two attributes of size and type, and its prototype has slice() method. We can use this method to cut our binary blob objects.

blob.slice(startByte, endByte) is a method in blob object. File object inherits blob object, so file object also has slice method.

Parameters:
startByte: indicates the number of bytes read from the beginning of the file.
endByte: indicates the number of bytes to end reading.

Return value: var b = new Blob(startByte, endByte); the return value of this method is still a Blob type.

We can use blob.slice() method to cut binary Blob objects, but this method is browser compatible, so we can encapsulate a method as follows:

function blobSlice(blob, startByte, endByte) {
  if (blob.slice) {
    return blob.slice(startByte, endByte);
  }
  // compatible firefox
  if (blob.mozSlice) {
    return blob.mozSlice(startByte, endByte);
  }
  // compatible webkit
  if (blob.webkitSlice) {
    return blob.webkitSlice(startByte, endByte);
  }
  return null;
}

3. Specific implementation

$(document).ready(() => {
  const chunkSize = 2 * 1024 * 1024; // each chunk Set to 2 megabytes
  // Use Blob.slice Method to split the file.
  // At the same time, this method is used differently in different browsers.
  const blobSlice = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
  const hashFile = (file) => {
    return new Promise((resolve, reject) => { 
      const chunks = Math.ceil(file.size / chunkSize);
      let currentChunk = 0;
      const spark = new SparkMD5.ArrayBuffer();
      const fileReader = new FileReader();
      function loadNext() {
        const start = currentChunk * chunkSize;
        const end = start + chunkSize >= file.size ? file.size : start + chunkSize;
        fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
      }
      fileReader.onload = e => {
        spark.append(e.target.result); // Append array buffer
        currentChunk += 1;
        if (currentChunk < chunks) {
          loadNext();
        } else {
          console.log('finished loading');
          const result = spark.end();
          // If it is used only result Act as hash When it's worth it, If the content of the file is the same and the name is different
          // You cannot keep two files if you want to. So add the file name.
          const sparkMd5 = new SparkMD5();
          sparkMd5.append(result);
          sparkMd5.append(file.name);
          const hexHash = sparkMd5.end();
          resolve(hexHash);
        }
      };
      fileReader.onerror = () => {
        console.warn('File read failed!');
      };
      loadNext();
    }).catch(err => {
        console.log(err);
    });
  }
  const submitBtn = $('#submitBtn');
  submitBtn.on('click', async () => {
    const fileDom = $('#file')[0];
    // Acquired files For a File Object array. If multiple selections are allowed, the file is multiple
    const files = fileDom.files;
    const file = files[0];
    if (!file) {
      alert('No files obtained');
      return;
    }
    const blockCount = Math.ceil(file.size / chunkSize); // Total number of segments
    const axiosPromiseArray = []; // axiosPromise array
    const hash = await hashFile(file); //file hash 
    // get files hash After that, if you need to do breakpoint renewal, you can hash The value is checked in the background.
    // See if the file has been uploaded, if the transfer has been completed and if the uploaded slice has been completed.
    console.log(hash);
    
    for (let i = 0; i < blockCount; i++) {
      const start = i * chunkSize;
      const end = Math.min(file.size, start + chunkSize);
      // Building forms
      const form = new FormData();
      form.append('file', blobSlice.call(file, start, end));
      form.append('name', file.name);
      form.append('total', blockCount);
      form.append('index', i);
      form.append('size', file.size);
      form.append('hash', hash);
      // ajax Submit fragment, at this time content-type by multipart/form-data
      const axiosOptions = {
        onUploadProgress: e => {
          // Progress in processing uploads
          console.log(blockCount, i, e, file);
        },
      };
      // Add to Promise In array
      axiosPromiseArray.push(axios.post('/file/upload', form, axiosOptions));
    }
    // After all segments are uploaded, it is requested to merge the segment files
    await axios.all(axiosPromiseArray).then(() => {
      // merge chunks
      const data = {
        size: file.size,
        name: file.name,
        total: blockCount,
        hash
      };
      axios.post('/file/merge_chunks', data).then(res => {
        console.log('Upload succeeded');
        console.log(res.data, file);
        alert('Upload succeeded');
      }).catch(err => {
        console.log(err);
      });
    });
  });
})

We need to get the total number of shards - then use for loop to traverse the total number of shards - then instantiate the formData data in turn - and add the corresponding shards to the formData data data in turn.

Then we use '/ file/upload' to request data respectively, and finally put all the successful data into the axiosPromiseArray array. When all the fragments are uploaded, we will use the wait axios.all (axiosPromiseArray). Then (() = > {}) method, and finally we will use the '/ file/merge_chunks' method to merge the files.

const Koa = require('koa');
const app = new Koa();
const Router = require('koa-router');
const multer = require('koa-multer');
const serve = require('koa-static');
const path = require('path');
const fs = require('fs-extra');
const koaBody = require('koa-body');
const { mkdirsSync } = require('./utils/dir');
const uploadPath = path.join(__dirname, 'uploads');
const uploadTempPath = path.join(uploadPath, 'temp');
const upload = multer({ dest: uploadTempPath });
const router = new Router();
app.use(koaBody());
/**
 * single(fieldname)
 * Accept a single file with the name fieldname. The single file will be stored in req.file.
 */
router.post('/file/upload', upload.single('file'), async (ctx, next) => {
  console.log('file upload...')
  // According to the document hash Create a folder to move the default uploaded file to the current hash Folder. Facilitate subsequent file consolidation.
  const {
    name,
    total,
    index,
    size,
    hash
  } = ctx.req.body;

  const chunksPath = path.join(uploadPath, hash, '/');
  if(!fs.existsSync(chunksPath)) mkdirsSync(chunksPath);
  fs.renameSync(ctx.req.file.path, chunksPath + hash + '-' + index);
  ctx.status = 200;
  ctx.res.end('Success');
})

router.post('/file/merge_chunks', async (ctx, next) => {
  const {    
    size, 
    name, 
    total, 
    hash
  } = ctx.request.body;
  // according to hash Value to get the fragment file.
  // Create storage file
  // merge
  const chunksPath = path.join(uploadPath, hash, '/');
  const filePath = path.join(uploadPath, name);
  // Read all chunks File name in array
  const chunks = fs.readdirSync(chunksPath);
  // Create storage file
  fs.writeFileSync(filePath, ''); 
  if(chunks.length !== total || chunks.length === 0) {
    ctx.status = 200;
    ctx.res.end('The number of sliced files does not match');
    return;
  }
  for (let i = 0; i < total; i++) {
    // Append write to file
    fs.appendFileSync(filePath, fs.readFileSync(chunksPath + hash + '-' +i));
    // Delete this used chunk    
    fs.unlinkSync(chunksPath + hash + '-' +i);
  }
  fs.rmdirSync(chunksPath);
  // After the file is merged successfully, the file information can be put into storage.
  ctx.status = 200;
  ctx.res.end('Merge succeeded');
})
app.use(router.routes());
app.use(router.allowedMethods());
app.use(serve(__dirname + '/static'));
app.listen(9000, () => {
  console.log('Service 9000 port started');
});

utils/dir.js, the function of this code is to determine whether there is a directory. If there is a directory, return true directly. Otherwise, create the directory

const path = require('path');
const fs = require('fs-extra');
const mkdirsSync = (dirname) => {
  if(fs.existsSync(dirname)) {
    return true;
  } else {
    if (mkdirsSync(path.dirname(dirname))) {
      fs.mkdirSync(dirname);
      return true;
    }
  }
}
module.exports = {
  mkdirsSync
};

Let's first look at the request of '/ file/upload'. After getting the file, the request callback succeeds. Then we will create an uploads directory in the root directory of the project

We can also see a lot of '/ file/upload' requests in our network, indicating that our requests are uploaded in pieces

Finally, after all fragment requests are uploaded successfully, we will call the request '/ file / merge [chunks' to merge all files and obtain the file fragments according to our hash value. Then we will cycle the total number of shards and write all shards to our filePath directory

fs.appendFileSync(filePath, fs.readFileSync(chunksPath + hash + '-' +i));

The code for obtaining filePath is: const filePath = path.join(uploadPath, name); That is to say, in the uploads folder under the root directory of our project, the reason for doing this is to prevent the sudden disconnection of the network or the sudden exception of the server. When the file is uploaded to half, we will save a part of the uploaded files locally. If we continue to upload, we will skip which files have been uploaded and continue to upload but not upload File. This is to prepare for breakpoint continuation. Next time, I will analyze how to implement breakpoint continuation.

The above is the basic principle of our whole fragment upload. We haven't done the breakpoint continuation yet. Next time we have time to analyze the basic principle of breakpoint continuation. The breakpoint continuation principle is nothing more than that in the process of our upload, if the network is interrupted or the server is interrupted, we need to save the file locally, and then when the network is restored, We will continue to upload. When we continue to upload, we will compare whether the uploaded hash value is the same in my local hash value. If it is the same, we will directly skip the segment upload, continue to the next segment upload, and then judge by analogy. Although it will take a little time to use this method for comparison, it will be relative to our re upload In terms of time consumed, these times are nothing. Next time we have time to analyze the basic principle of next breakpoint continuation.

Tags: Fragment network axios Spark

Posted on Tue, 05 May 2020 15:23:53 -0700 by affordit