Products Resources Support About Us

Creating xlsx from csv in Unidata 8.2 using Python


#1

I’m running Unidata PE version 8.2 on Windows 10. The goal is to use the Python libraries in openpyxl to convert a tilde delimited flat file into an Excel file (xlsx).

My question is can openpyxl be added to and used by Python 3.4 that comes with Unidata or should I shell out and run a python program to accomplish the csv to xlsx conversion?

Thanks.

Tom Cordes


#2

Not Python, but I published FOSS on GitHub to convert CSV to XLSX. My clients and I use it from D3, U2, and other platforms.

[ad] This is used from NebulaXLite which allows BASIC to generate richly formatted multi-sheet workbooks, compared to simple delimited files. [/ad]

HTH


#3

Thanks, Tony.

I was looking for something that would run on Linux and would be able to apply formatting to a csv file when turned into xls/xlsx.

TomC


#4

OK - let’s piece this apart

  • You want to generate a XLSX workbook that can be opened by Excel.
  • The original challenge was to do this in Windows but it has expanded to Linux.
  • The challenge has been further expanded to add formatting, which is not present in typical delimited files.
  • XLSX allows for multiple spreadsheets to be in a single workbook, where CSV only includes one sheet/document per file. That might be a factor.
  • You’ve specified Python and openpyxl.

What’s not clear is whether you already have CSV files that you now want to enhance, or whether that term is being used generically to imply a source of data. This is important. You said you want formatting, which as stated above isn’t a part of simple delimited files. If you already have these files then the formatting needs to come from elsewhere. If you don’t already have these files, you’re starting from scratch, and you just want to create XLSX, then CSV might not be the right solution for the final challenge.

It’s also not clear if you really prefer to use Python, and specifically openpyxl, or if this is the vector you’re pursuing as your research has led you in this direction.

I point out these things because people frequently request a way to facilitate their approach to solve a problem/ That might be an unnecessary digression, as that approach might not be the best way to solve the problem. For example, many people say “how do I do reporting with ODBC” when their goal isn’t really ODBC, they really want anything that leads to better reporting. In this example, you’ve explicitly started with CSV and Python, I’m thinking that could be because that’s a vehicle you’ve found for getting Excel on Linux. I suggest that if that is the challenge that there are other options.

NebulaXLite allows you to create formatted, multi-sheet workbooks from BASIC. That includes formulas, proper date typing, merged cells, links, images, sized rows and columns, defined styles, and document-level properties like Company, Author, etc. With this, you don’t need Python - anyone who already works with BASIC can do what you’re doing. If your goal is really to use Python because that’s a strategic company goal or because you want to exercise your skills, then this isn’t a good fit and I humbly withdraw the suggestion. NebulaXLite is free for developers and only $200 for a server running production client reports.

The NebulaXConvert program referenced earlier is a .NET program that uses Windows-based libraries to repackage SpreadsheetML from NebulaXLite into a XLSX file. That’s Windows-only. It also uses Interop, which requires Excel on the system. We use this from Linux by saving the source file through file sharing into Windows. That triggers the conversion which saves the XLSX file into another folder, and then we read the resulting file back to Linux. This can then be deployed from Linux via email, FTP, served to a web page, or uploaded to Google Docs.

For a full Linux solution… The XLSX file is an OpenXML (OOXML) file (ECMA-376). The openpyxl utility combines the ability to manipulate a workbook (NebulaXLite) with the ability to save that workbook as XLSX (NebulaXConvert). I haven’t looked deeply into the openpyxl API but we may be able to use that to open a SpreadsheetML text file (which has multiple sheets, formatting, etc) and then save that as .XLSX. That would be cool. I just asked about this in the openpyxl forum. In the meantime, there are other options… assuming you just want to solve the problem, and you’re not firmly set on Python and openpyxl.

EPPlus is a popular .NET solution that works without Excel. It may already work over Linux via .NET Core. The OpenXML SDK did work via Mono over Linux and I would guess that this is all stable stuff now.

If you have any familiarity with Java, the Apache POI project may be of interest.

And for someone who is already familiar with PHP, there’s PhpSpreadsheet, and I do want to see if that can be used as a replacement for NebulaXConvert over Linux.

At this point I’m not leading in any specific direction. I’m suggesting that there are lots of ways to skin this cat. Without knowing your preferences and requirements it’s tough to make other specific recommendations.

HTH