JAVA | XML 파싱 + Excel 파일로 내보내기

당월 논문서치하다가 , 최신 논문리스트를 xml로 제공해주는것을 발견!

일일히 엑셀로 복붙하기귀찮았었는데 이참에 파싱해서 엑셀까지 만들어주는 프로그램을 만들었다

네이버에 찾아보니 어느 강같으신분이 내가 필요한 코드를 다 공유해주셔서

고대로 복붙하고 내꺼에 맞게 쓱쓱 고쳤다

1. Java에서 Exel을 다루기 위해 Apache POI 라이브러리 추가

2. 코드 복붙

3. 파싱할 xml 데이터에 맞게 태그명 수정하기

4. 엑셀 스타일 수정하기

1. Maven pom.xml

이클립스 Luna버전, 프로젝트명 오른쪽클릭해서 Cofigure-convert to maven project 하면 생김


  4.0.0
  Paper_Search
  Paper_Search
  0.0.1-SNAPSHOT
  
...
  
  
     
            org.apache.poi
            poi
            3.12
        
        
            org.apache.poi
            poi-ooxml
            3.12
        
 
        
            xpp3
            xpp3
            1.1.4c

2. Paper_Search.java

거의 그대로긴 한데 tag를 비교해서 값을가져 오는 부분에서 이상하게 data를 잘 읽어놓고 null 값을 한번씩 더 읽었다 그래서 정작 DTO 객체에는 null 값이 들어가는 현상이...

아마도 [title]abcd[/title]이렇게 있으면 tag.equals이 [title]일때도 인식하고 [/title]일때도 인식해서 그런듯? event type을 잘이용하면 될것같긴한데 귀찮아서 그냥 flag변수를 만들었다 ㅎㅎ

그리고 난 5개 파싱할꺼니까 for문이랑 string 배열 사용 !

import java.io.BufferedInputStream;
import java.net.URL;
import java.util.Date;
import java.util.ArrayList;

import org.xmlpull.v1.XmlPullParser;
import org.xmlpull.v1.XmlPullParserFactory;
 
public class Paper_Search {
 
    public final static String URL[] = 
    	{"http://ieeexplore.ieee.org/rss/TOC2220.XML",
    	"http://ieeexplore.ieee.org/rss/TOC4.XML",
    	"http://ieeexplore.ieee.org/rss/TOC8919.XML",
    	"http://ieeexplore.ieee.org/rss/TOC8920.XML",
    	"http://ieeexplore.ieee.org/rss/TOC92.XML"};
     
    public Paper_Search(){
        
        try {
            apiParserSearch();
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        
    }
    
    public void apiParserSearch() throws Exception {
        
        
        ArrayList list = new ArrayList();
        
        String publication=null,year=null,month=null,day=null,
                title=null,link=null,description=null,authors=null;
        
        int tag_flag=0;
        
        String pub[] ={"EL","JSSC","TCAS1","TCAS2","TVLSI"};
        
        for(int i=0; i<5 ;i++){
        	        	
        	URL url = new URL(URL[i]);
        	
        	publication = pub[i];
        	 
            XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
            factory.setNamespaceAware(true);
            XmlPullParser xpp = factory.newPullParser();
            BufferedInputStream bis = new BufferedInputStream(url.openStream());
            xpp.setInput(bis, "utf-8");
            
            String tag = null;
            int event_type = xpp.getEventType();
            
	        while (event_type != XmlPullParser.END_DOCUMENT) {
	        	
	        	if (event_type == XmlPullParser.START_TAG) {
	                tag = xpp.getName();
	                tag_flag = 0;
	            } else if (event_type == XmlPullParser.TEXT  && tag_flag == 0) {
	                //각 테그별로 값을 가져온다.
	            	if(tag.equals("year")){
	                    year = xpp.getText();
	                }else if(tag.equals("issue")){
	                    month = xpp.getText();
	                }
	                ...
	            } else if (event_type == XmlPullParser.END_TAG) {
	                tag = xpp.getName();
	                tag_flag=1;
	                
	                if (tag.equals("item")) {	                		
	                	//DTO 객체를 만들어 여기에 데이터를 집어넣어준다.
	                    ParmDTO entity = new ParmDTO();
	                    entity.setPublication(publication);
	                    entity.setYear(year);
	                    ...
	                    	                    
	                    list.add(entity);
	                }
	            }
	 
	            event_type = xpp.next();
	        }
	        
	        System.out.println("Parsing "+pub[i]+" "+day);
 
        }
            
        //엑셀파일을 쓰는 부분이다. 
        new ExcelWriterService().makeExcelFileParmList(list);
           
    }
        
    public static void main(String args[]){
    	java.util.Calendar cal = java.util.Calendar.getInstance();     
        Date time = cal.getTime();
    	System.out.println(time);
        new Paper_Search();
    }    
}

3. ParmDTO.java

요건 진짜 변수명만 바꿈 ㅎㅎㅎ

public class ParmDTO {

    private String publication=null,year=null,month=null,
    		day=null,title=null,link=null,description=null,authors=null;
    
    public String getPublication() {
        return publication;
    }
    public void setPublication(String publication) {
        this.publication = publication;
    }
    
    public String getYear() {
        return year;
    }
    public void setYear(String year) {
        this.year = year;
    }  
    ...
}

4. ExcelWriterService.java

제일 열심히 코딩한부분이다 ㅋㅋㅋ 검색기능 넣을라고, txt파일 읽어오는것도 하고 특수문자가 유니코드로 적혀있어서 그거 특수문자 변환하는것도 넣고 ~ 그것들은 뒤에 또 포스팅 해야징~

하이퍼링크기능은 넣으려다가 실패함 ㅜㅜ 왜 안되지?

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.*;
import org.apache.poi.ss.usermodel.CellStyle;
import org.apache.poi.ss.usermodel.Font;
import org.apache.poi.ss.usermodel.IndexedColors;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
 
public class ExcelWriterService {
 
    public ExcelWriterService() {
 
    }
 
    public void makeExcelFileParmList(ArrayList list) 
                                              throws FileNotFoundException {
    	
    	//키워드 비교를 위한 인자를 txt파일에서 받아 list에 저장후, string으로 변환시켜줌
    	FileInputStream fis;    	
    	String line;
    	
    	List mList = new ArrayList();
		try {
			fis = new FileInputStream(new File("key.txt"));
			InputStreamReader isr = new InputStreamReader(fis); 
	        BufferedReader br = new BufferedReader(isr); 
	        while((line = br.readLine())!=null){ 	           
	            mList.add(line);
	        } 
	        System.out.println("Keyword : "+mList);    
	        fis.close();
	        
		} catch (IOException e1) {
			// TODO Auto-generated catch block
			e1.printStackTrace();
		}        
		String[] keyword = mList.toArray(new String[mList.size()]);
        
		Workbook workbook = new XSSFWorkbook();
        
        // 시트명 설정
        Sheet sheet = workbook.createSheet("All");
        Sheet sheet2 = workbook.createSheet("Sel");
        
        //켜질때 sheet2가 보이게
        sheet2.setSelected(true);
        workbook.setActiveSheet(1);
        
        Row row;
        
        //컬럼너비
        sheet.setColumnWidth(4, 20000);
        sheet2.setColumnWidth(4, 20000);
  
        //틀고정
        sheet.createFreezePane( 0, 1, 0, 1 );
        sheet2.createFreezePane( 0, 1, 0, 1 );

        
        // *** Style--------------------------------------------------
        // Cell 스타일 생성
        CellStyle cellStyle = workbook.createCellStyle();
        	cellStyle.setWrapText(true);
        ...
        
        // 타이틀 생성부
        row = sheet.createRow(0);
        row.createCell(0).setCellValue("Publication");
        row.createCell(1).setCellValue("Year");
        ...    
        for(int i=0; i<10; i++){
    		row.getCell(i).setCellStyle(cellStyle_header);
    	}
        
        row = sheet2.createRow(0);
        row.createCell(0).setCellValue("Publication");
        ...
        
        for(int i=0; i<10; i++){
    		row.getCell(i).setCellStyle(cellStyle_header);
    	}
 
        int count = 1;
        int count_sel = 1;

    	String title,des,url;
    	String authors;
    	String [] author = {""};
    	
    	
    	for (ParmDTO entity : list) {
            row = sheet.createRow(count);
            count = count + 1;
 
            title = entity.getTitle();
            ////title에 있는 유니코드 특수문자로 변경
            if (title.contains("&#x")){
            	String [] tsplit = title.split("&#x");
            
	            for (int b=1 ;b<tsplit.length; b++) {
	            	String unicode = tsplit[b].substring(0, 4).split(";", 2)[0];
	            	String old = "&#x"+unicode+";";
	            	int hex = (Integer.valueOf(unicode,16));
	            	String replace = String.format("%s", (char)hex);
	            	title = title.replace(old, replace);
	            }
            }
           
            des = entity.getDescription();
                              
            //제 1저자만 가져오기
            authors = entity.getAuthors();
            if(authors!=null){
            	author = authors.split(";");
            }
            
            //url detail page에서 pdfview page로 변경
            url = entity.getLink().replace
                        ("xpl/articleDetails.jsp?", "stamp/stamp.jsp?tp=&");
            
            row.createCell(0).setCellValue(entity.getPublication());
            row.createCell(1).setCellValue(entity.getYear());
            ...

            //key word 검색
            boolean Key_flag=false;
            String Key_sel = "";
            
            for (int i=0; i<keyword.length;i++){
	            if (title.contains(keyword[i])||des.contains(keyword[i])){ 
	            	Key_flag=true;
	            	if (Key_sel != "") Key_sel = Key_sel + ", "+keyword[i];
	            	else Key_sel = keyword[i];
	            }
            }
            if (Key_flag) {
            	for(int i=0; i<10; i++){
            		row.getCell(i).setCellStyle(cellStyle_true);
            	}            	
            	row = sheet2.createRow(count_sel);
            	row.createCell(0).setCellValue(entity.getPublication());
                row.createCell(1).setCellValue(entity.getYear());
                ...
            	count_sel = count_sel +1;
            }
            
        }
        
        java.util.Calendar cal = java.util.Calendar.getInstance();
        int year=cal.get(Calendar.YEAR);
        int month=cal.get(Calendar.MONTH)+1;
        int date=cal.get(Calendar.DATE);
        
        String now = String.format("%04d.%02d.%02d",year, month,date);
        String fileNm = now+" Latest Paper.xlsx";
        FileOutputStream fos;
        try {
            fos = new FileOutputStream(fileNm);
            workbook.write(fos);
            fos.close();
            workbook.close();
          System.out.println("done");
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
 
 
}

※ 참고한 사이트

1. http://blog.naver.com/yandul83/220375184664

2. http://zero-gravity.tistory.com/237

3. http://poi.apache.org/

'나는공대생 > 프로그래밍' 카테고리의 다른 글

JAVA \| 웹페이지 HTML 소스 파싱 (0)	2015.09.04
JAVA \| 메일에 파일 첨부하기 (Java Mail API) (0)	2015.09.04
JAVA \| 메일내용에 HTML 태그 넣기 (Java Mail API) (0)	2015.09.04
JAVA \| 메일 보내기 (Java Mail API) (0)	2015.09.04
MFC \| WMV파일 재생하기 (0)	2014.11.15
MFC \| Picture Control을 이미지 파일로 저장하기 (0)	2014.11.15
MFC \| Dialog 배경이미지 넣기 (0)	2014.11.15
MFC \| Picture Control을 버튼으로 사용하기 (0)	2014.11.15

L71026

L71026

MI.O