R – the choice for serious analysis

While Steve is a little “under the weather” (it must be all the snow that Al Gore sent him), I thought I’d mention an interesting article in the New York Times which sings the praises of the programming language R.

R is similar to other programming languages, like C, Java and Perl, in that it helps people perform a wide variety of computing tasks by giving them access to various commands. For statisticians, however, R is particularly useful because it contains a number of built-in mechanisms for organizing data, running calculations on the information and creating graphical representations of data sets.

Some people familiar with R describe it as a supercharged version of Microsoft’s Excel spreadsheet software that can help illuminate data trends more clearly than is possible by entering information into rows and columns.

What makes R so useful — and helps explain its quick acceptance — is that statisticians, engineers and scientists can improve the software’s code or write variations for specific tasks. Packages written for R add advanced algorithms, colored and textured graphs and mining techniques to dig deeper into databases.

So there you have it: R is the future.

Give up the Fortran, Mike and the Hockey Team, and join the 21st Century. Uncle Steve does know best.

R can be downloaded at http://www.r-project.org/ and is available for Windows, Linux and Mac OSX. And its free.

This entry was written by John A, posted on Jan 8, 2009 at 4:40 AM, filed under Data, Scripts and tagged R. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

80 Comments

Michael G.

Posted Jan 8, 2009 at 6:31 AM | Permalink

Longtime lurker here. I’m an economist and haven’t yet tackled R, but a grad student of mine has used it extensively in her work and she likes it a great deal. For my money though, and it’s never my money, I haven’t found any statistical software superior to Stata.
Rich

Posted Jan 8, 2009 at 7:20 AM | Permalink

I looked at the Stata web page and noticed a link, “Purchase Stata”. That’s as far as I got.

Matt Briggs’s book has lots of cool stuff on using R. Just the explanation of what “~” means almost makes it worth it but there’s a lot of excellent stuff on statistics too. The link’s in the sidebar.

There’s no hiding that fact, though, that getting comfortable with R is a long journey.
- John A
  
  Posted Jan 8, 2009 at 6:01 PM | Permalink
  
  Re: Rich (#2),
  
  Which book are you referring to?
  - Rich
    
    Posted Jan 9, 2009 at 6:09 AM | Permalink
    
    Re: John A (#26),
    
    “Breaking the Law of Averages” here
    
    All the examples are in R and all of it’s downloadable. (Not the book, the examples).
_Jim

Posted Jan 8, 2009 at 7:59 AM | Permalink

R is similar to other programming languages, like C

Incldue files?

Declraration sections?

Subroutines?

Serial/comms libraries?

Source-level debugger?

Optimization for target?

Wow …
bernie

Posted Jan 8, 2009 at 9:00 AM | Permalink

Powerful statistical packages are not a substitute for understanding what you are doing, whether you should be doing it, and what you can really say once you have done it. Just ask the HS Team!

Mastering R is one of my 2009 resolutions.
Steve McIntyre

Posted Jan 8, 2009 at 9:41 AM | Permalink

Thanks, John A, for making this thread.

When I stumbled into climate a few years ago, I had no intellectual investment in another language. I’d learned Fortran at university in the 1960s but hadn’t looked at a Fortran code since then. Nor had I had any occasion to do any programming while I was doing business.

So I had a clean slate. My first analyses were in Excel, but as soon as I started with tree ring measurement data, the spreadsheets exploded. Plus I had an older machine at the time and it froze a lot. So I needed to figure out a better way.

I started on R because it was free. The other choice was Matlab.

Once I started on it, there were several features that made it ideal for my usage. I download a LOT of data. R enabled you to insert a url into your script and grab your data without manual intervention. (I’m sure that you can do the same thing in Matlab, but it was something that I could do in R right away.)

I still remembered matrices, vectors and functions and thought mathematically. So I could design functions right away.

R is concise enough as a script that it is easy to document. When I look at Fortran scripts, they have line after line devoted to do loops and index management that’s done within the R vectors or functions. When I read R scripts from Mann’s group (or Ammann), they all too often look like Fortran code written in a different language. Too many do loops; too many indices; not enough vectors; not enough use of logical vectors; not enough use of “factors” and data.frames; not enough use of functions. It’s ugly code – sort of like Hansen’s GISTEMP.

Not that my code is any marvel, but at least I try to use the resources of the language. I’m probably at the stage where it would be interesting to take some sort of advanced refresher seminar. Now that I think of it, such a seminar at AGU would probably be wildly successful.
bill-tb

Posted Jan 8, 2009 at 10:45 AM | Permalink

“R” like “C” hardly. As Jim said, the comparison is not valid. “R” is a high level language, and a good one at that. In that it enables non-computer types to do extraordinary things without the burden and minutia of a lower level language like C. R is far better than programs like Excel, although you can get “R” libraries for Excel — Not as good as the real deal, but can be a transition for someone who uses Excel all the time.

But it is not C by any stroke. You want to write “R”, use C, you want to do engineering, science, analysis and statistics, use “R”.

But then again, what do you expect from the New York Times these days.

R’s powerful, and it’s free. If you use Linux/UNIX, and who doesn’t these days – VBG – then all the official packages are usually available through your distribution package manager. Including some gui tools. I like Ubuntu, and it’s all there for the picking.

If you want to poke around with another general purpose easy languages, might I recommend python. It’s like basic, but allows you to do low level things in Linux that can eliminate hours of programming effort with just a little program.

NIXes rock. Did I mention, it’s free?
counters

Posted Jan 8, 2009 at 11:02 AM | Permalink

In the spirit of helping fellow programmers, let me add a bit to bill-tb’s suggestion on adopting Python. There is currently a project called RPy which aims to create a Python interface to R. Any up-and-coming programmer will likely have been weaned on Java or C++, and the transition to Python from either language is very simple. Python provides a language with immense flexibility; it is just as suitable for writing large-scale applications as it is to writing short scripts. Adding the RPy module to your standard Python set-up gives you direct access to a wide variety of analytical tools directly from R, and also expands the functionality of Python for creating plots, documents, or whatever.

A caveat, though: it’s not really fair to harp on people for sticking with FORTRAN. Nearly the entire field of atmospheric science depends on FORTRAN; the majority of the current generation GCM and NWP models are written in it, primarily because of very good cross-platform compiler support. The language is still taught to undergrads in the field so that they may work with all the legacy code that is floating around, and many scientists continue to use it.

The up-and-coming generation, however, is much more computer savvy, and will bring new skills – and new languages – to the table which should help modernize things a bit. Hell, with the genuine onset of parallel-programming and GP-GPU programming, a lot of things will likely change in the coming years.
- Kusigrosz
  
  Posted Jan 8, 2009 at 12:15 PM | Permalink
  
  Re: counters (#7), Just in case any Python code gets posted here, it would be nice if it appeared if fixed font. You know why.
  - Not sure
    
    Posted Jan 8, 2009 at 12:34 PM | Permalink
    
    Re: Kusigrosz (#11),
    
    I wonder
    what
    wordpress
    will do
    to
    Python.
    
    Heavy worries!
  - counters
    
    Posted Jan 8, 2009 at 2:00 PM | Permalink
    
    Re: Kusigrosz (#11), Of course 🙂
Mark T.

Posted Jan 8, 2009 at 11:21 AM | Permalink

I learned FORTRAN in college, but not surprisingly, I keep forgetting that I took that class in 1988 or 1989.

Does R have libraries for dealing with multiple processors yet? MATLAB recently came out with a Parallel Computing Toolbox which I’m thinking about doing a trial with. The benefit of ease of use with these high-level languages is traded for decreased performance which, in some cases, can be a significant drawback. However, now that the typical PC has multiple processors, complex routines can be handled in a more reasonable time-frame.

I keep meaning to install R just so I can do the same things that Steve is doing without translation. I already have a copy of MATLAB at home and I probably will always have a copy simply because I use it so much. Nothing wrong with living in both worlds.

Mark
sonicfrog

Posted Jan 8, 2009 at 11:55 AM | Permalink

From a Slashdot comment on R:

“Googling for “R” turns up way to much noise and way too little signal.”
Steve McIntyre

Posted Jan 8, 2009 at 12:08 PM | Permalink

#9. I google “r-project” plus a term,
Lars Kamél

Posted Jan 8, 2009 at 12:36 PM | Permalink

I will NEVER give up Fortran! 🙂
Dave Dardinger

Posted Jan 8, 2009 at 1:43 PM | Permalink

Programming in R
compared to Fortran or C.
A rose or a thorn?
Dan Evens

Posted Jan 8, 2009 at 2:15 PM | Permalink

It is possible to write terrible code in any language.

That being said, it is easier to write terrible
code in some languages. I’ve seen some mighty
unpleasant progs written in FORTRAN. And some of
the worst offenders have been in C or C++.

Pick the right language for the project. Then get
yourself a few good texts on writing good code that
can be maintained, easily tested, easily documented,
easily explained to your target audience.
MC

Posted Jan 8, 2009 at 2:17 PM | Permalink

I downloaded a Windows version of R from r-project. Easy to install. The best bit was when I realised that you can run scripts using the script window as is instead of writing functions. So you can just copy Steve/Roman’s scripts into a script editor and save it as a text file or an R file. When you want to run anything, just highlight it and then press the ‘Run Line or Selection’ button. Easy.
I approached it a bit like Matlab, though R appears to run faster. R also seems not to be too fussed if a text file mixes text with numbers and there’s that very handy NA flag that can be used for averaging.
It is so worth putting it on your pc just to see what Steve is talking about and to work it through yourself.
sean egan

Posted Jan 8, 2009 at 3:47 PM | Permalink

R is an language without the features which push you to design before you code – which I think is thats a bad thing, but not everyone agrees with me.

What does have does have is a lot of real users doing real stats, who share code to do all the main stream calculations. If the complexity is already done for you in well proven libraries, there is less scope to get it wrong. If you are a climate guy, and do not find the stats functions you need, you are probably going about it the wrong way. If you need a method none of these rocket scientist are using, are you sure the maths is sound?

Other than climate modeling, climate calculations are standard stuff just do not have performance issue.

I suspect the reason all the climate stuff is not already available in R, is some folks really do not get the idea of open source and sharing. If you open up the fortan/C/perl/etc source, there is always smart ass who will put into R, just to show you he can replace hundreds of Fortan lines with R package calls.
Hu McCulloch

Posted Jan 8, 2009 at 3:47 PM | Permalink

I had posted a query about R vs Matlab on the 1/7 Unthreaded thread, which received replies there from Steve and Mark T. I’m taking the liberty of copying them here, since they are directly related to this thread. My post at #20 of Unthreaded:

Re “Data Analysts Captivated by R’s Power” (#2, Dan Hughes on Unthreaded #31, etc), should one use R or Matlab for vectorized number crunching and technical graphics, if someone else is paying the bill?

Steve likes R, since it’s free and evidently very good, but Matlab is gaining ground over other similar proprietary programs like GAUSS, which many economists like myself are accustomed to. Should I switch to Matlab or R, given that I don’t have to pay for it myself?

UC has recently posted some Matlab code, which has inspired me to brush up on Matlab of late.

Steve’s reply at #22 of Unthreaded:

20. There are so many packages for R that it’s unbelievable and the gap will spread. I can’t think of any reason to learn Matlab rather than R. But even if you change your mind, learning R isn’t wasted since Matlab code has many parallels to R.

Mark T’s reply at #23 of Unthreaded:

If it is free, I’d use MATLAB, for no other reason than the graphic support. Overall it is probably easier to use since it’s an interpreted language ala BASIC and the library of functions is ridiculously large, particularly if you include user contributed stuff at their website, which is free. It is probably slower than R until you get really good at it (lots of nuances). Keep in mind, MATLAB is really LINPACK and EISPACK, two matrix solvers (linear algebra and eigenvalue decomposition respectively).

I’m biased, however, as I’ve been using MATLAB for 20 years or more now, and I’ve never used R. Steve has used both so he might be better able to answer your question.

Mark

I concur with most others here that FORTRAN, with all its nested do loops and lack of graphics, is long obsolete. One thing I miss in FORTRAN, however, is its FORMAT capabilities, that allowed a nice table to be printed up easily.

Any other views on Matlab vs R?
Bob Hawkins

Posted Jan 8, 2009 at 4:10 PM | Permalink

When I woke up this morning, I had no idea that I would read a good word for FORTRAN FORMAT statements!

In my experience, what separates languages is not the languages, but the available documentation. The language can be the greatest thing since the conditional branch, but if the documentation sucks or just doesn’t exist, it’s useless.
Mark T.

Posted Jan 8, 2009 at 4:45 PM | Permalink

Btw, MATLAB does “compile” scripts and/or functions the first time you run them, so they tend to execute faster on subsequent runs assuming you don’t make any changes.

The biggest barrier for normal humans is entry cost. The basic program is nearly $2000. Most of the toolboxes are in the $1000 range, particularly those people in here would be interested in, but the highest is more than 7x that cost and requires several other toolboxes to operate. There’s a lot of real-time stuff, and even ports to HDL which allows FPGA programming. The latter is something I’d be interested in since I don’t know VHDL or Verilog but I regularly design stuff for FPGA implementation (I have a VHDL guy to work with). This one is also the expensive one, btw. 🙂

When I logged on I was still listed as a student (not anymore) so it would only give me the student edition prices, which are about $100 for the base program and there are about 10 or so toolboxes available all for a few hundred beans as well. You have to be a student to get that price but there is also a price for teachers/educators.

Mark

*Um, I get a few quatloos for finding this out… my first born is pledged to someone I think.
Mats Holmstrom

Posted Jan 8, 2009 at 5:03 PM | Permalink

I disagree that FORTRAN is obsolete. As in all other areas it is a question of using the right tool for the right task. I agree that FORTRAN is not a language that should be used for the tasks discussed here. Reading text files with data, doing some transformations and computations on the data, and then presenting it graphically is best done in languages like R, Matlab or Python.

But when it comes to manipulating huge matrices quickly FORTRAN is still a good language. The reason is that it is the best compiled language to handle multi-dimensional arrays, since they are built into the language.

Then one has to distinguish between FORTRAN and Fortran. By that I mean FORTRAN 77 and earlier dialects on one hand, and Fortran 90/95/2003 on the other hand. In the modern Fortran dialects one can use vector notation instead of do-loops. This makes a Fortran program look much like a Matlab program. In fact, I often code in Matlab, and if execution time becomes an obstacle I switch to Fortran. In a recent example the resulting speed-up was a factor of 16.

The excellent matrix handling and the fast execution of a compiled language is the reason Fortran is still used in large simulation codes, like fluid flow simulations.
Mark T.

Posted Jan 8, 2009 at 5:08 PM | Permalink

That’s the key with MATLAB – vector notation – which renders many loops obsolete. Sometimes it is a challenge to figure out how to use your vector notation to avoid a loop, however. Is this possible with R?

Since I no longer have any need for FORTRAN, I do embedded signal processing, it’s C/C++ with matrix libraries if needed (and plenty of assembly routines since compilers don’t deal with SIMD architectures very well). I understand FORTEANs utility, however.

Heck, I even learned FORTH in college. Whatever happened to that language?

Mark
GF

Posted Jan 8, 2009 at 5:17 PM | Permalink

To search for R-ish things [thanks also to the slashdot crowd]: http://www.rseek.org/
John A

Posted Jan 8, 2009 at 5:55 PM | Permalink

It’s been one of my long term projects to learn R (and statistical analysis, but still…), but I’m trying to understand mathematical models using MathCAD (which is simple but not very flexible) and I’m looking at Mathematica or Maple in the near future.

R presents a steep learning curve (and yes, I learned Fortran-77 at university in the ’80s as well) so its going to take some time. I did buy a book called “Learning Statistics with R” – but where did I put it?
Demesure

Posted Jan 8, 2009 at 6:25 PM | Permalink

There is demo video of R setup here : http://pichuile.free.fr/tutorial-R/tutorial.htm
Using the editor TinnR to write codes is way more confortable & efficient than notepad.
- Scott Lurndal
  
  Posted Jan 8, 2009 at 6:50 PM | Permalink
  
  Re: Demesure (#27),
  
  _Any_ editor is better than notepad. Note that Word is not an editor :-).
  
  Personally, I like vim (which is available for windows, linux and OS/X).
sonicfrog

Posted Jan 8, 2009 at 7:33 PM | Permalink

I’m a Gnome Geek and like Gedit.
Mark T

Posted Jan 8, 2009 at 8:11 PM | Permalink

I just used MATLAB’s editor, which handles just about every type of code file. Eclipse has a good one that is FREE, too. 🙂

Mark
- Not sure
  
  Posted Jan 8, 2009 at 9:08 PM | Permalink
  
  Re: Mark T (#30), Coolness. Maybe I’ll actually get around to trying it sometime.
tesla

Posted Jan 8, 2009 at 10:01 PM | Permalink

I’ve used many mathematics platforms: BASIC, C, MATLAB, Scilab, LabVIEW, Mathematica, R, MAPLE, Excel, Mathcad, Minitab…

They all have their uses so you can’t categorically say one or the other is “the best” in general. For data analysis I like MATLAB the best, but if you can’t get it for free (or for the student price) I’d go with R. Mainly because the vectorized operations keep the focus on the math and not so much on the programming per se. Both have good graphics and data visualization support as well.

MATLAB is more for engineering whereas R seems to be geared more towards pure data analysis though of course there is significant overlap. The signal and image processing support in particular is far superior for MATLAB (though there might be packages for R that I just don’t know about). I also like that MATLAB has a built-in compiler and you can embed your code directly into hardware without a translation to C or assembly language. MATLAB’s GUI editor has been pretty handy on occasion, too. So MATLAB is more versatile but it can come at a steep price.

MATLAB does have a number of handy functions for reading data tables (urlread(), textread(), etc.). I also like the MATLAB editor better than the R one; it’s much easier on the eyes and that makes a big difference when you’re looking at a computer screen all day.

Nonetheless I use R at home because it’s nearly as good as MATLAB for basic data analysis and because it’s free. I found R to be really easy to learn.
Alan Wilkinson

Posted Jan 8, 2009 at 11:16 PM | Permalink

Amazing what little old NZ has given the world, isn’t it?
Not sure

Posted Jan 8, 2009 at 11:35 PM | Permalink

This looks interesting.
gs

Posted Jan 9, 2009 at 1:10 AM | Permalink

The price of R is attractive and so is the enthusiastic support from academia, but the last time I checked (maybe a year ago), the R community had not set a high priority on SMP. I’m not a strong programmer & don’t want the hassle of running a cluster, but I intend to take advantage of SMP via evolving multicore CPUs.

I’ve returned to Mathematica from Matlab. No doubt in response to competition from Matlab, the Mathematica people have made their software more accommodating to numerical work–and they have developed SMP versions.

When R follows suit SMPwise, I will invest the time to learn it with an eye toward possibly switching from Mathematica.
sean egan

Posted Jan 9, 2009 at 1:32 AM | Permalink

The real problem with MATLAB is not that YOU have to pay for it. Your employer or University may pay for you. The problem is that the people who may want to share/look at what you are doing have to pay.
- Mark T
  
  Posted Jan 9, 2009 at 2:59 AM | Permalink
  
  Re: sean egan (#36),
  
  The problem is that the people who may want to share/look at what you are doing have to pay.
  
  Uh, no, not true. The programs are simple text file scripts so you only need an editor to view them. Execution is different, and requires either the base program or whomever generates the file needs the ability to generate executables (toolbox).
  
  Mark
  - Ross Berteig
    
    Posted Jan 9, 2009 at 3:29 AM | Permalink
    
    Re: Mark T (#40),
    
    The programs are simple text file scripts so you only need an editor to view them. Execution is different, and requires either the base program or whomever generates the file needs the ability to generate executables (toolbox).
    
    But execution is critical if replication is the goal. If you are lucky, then only features available in a recent build of Octave were used, and you only have to pay the non-$$$ costs of installing and learning to use another free tool. But if any of those fancy, expensive, and optional modules were used, then the cost to tool up to replicate a calculation can be unreasonably high.
    
    I’d say that is a really strong argument in favor of preferring to use R when R’s features fit the problem at hand even if MATLAB is also available. My impression is that learning R is about the same effort as learning MATLAB, its just the capital outlay that differs. And of course, the pool of people that can replicate and/or audit your work is larger if the tools to do so are free.
    - Mark T
      
      Posted Jan 9, 2009 at 11:29 AM | Permalink
      
      Re: Ross Berteig (#41), And, as I said, it is possible to create executables. In the world of people that use MATLAB, their colleagues typically do as well so there’s never an issue. Plus, you can get the freeware versions to run MATLAB scripts anyway. They aren’t as up to date so there may be incompatibility issues with some functions, but nearly all of the basic stuff runs identically. This is partly because years ago Mathworks published the m-file scripts for all of their functions except the basic primitives (which are generally part of a C library anyway). As a result, most of the actual text for the various toolboxes is known (albeit copyrighted, but that’s not difficult to get around since the algorithms are public domain).
      
      Mark
Demesure

Posted Jan 9, 2009 at 2:21 AM | Permalink

TinnR does syntax coloring for R scripts, a must-have feature for comfortable coding.
Ross Berteig

Posted Jan 9, 2009 at 2:35 AM | Permalink

There is an open source clone of MATLAB, known as GNU Octave. Their goal seems to be to keep up with MATLAB feature creep, but it is a Red Queen’s Race, and the MATLAB folk have a lot more resources behind them.

A long time ago, I and my company got hooked on MathCAD as an engineer’s worksheet. It ran well on Windows 3.1, so that was quite a while ago. Recently, I had the urge to have a copy at home, and went looking for the current price. Wow. We must have paid under $250 a seat when we bought version 2.0. Now, it is competitive with MATLAB… even if it is just a touch cheaper, it is still well outside my budget.

For just plotting data without any analysis at all, I have been very happy with gnuplot (not related to either GNU or the animal). It is also open source, and supports a wide array of scientific plotting styles. I’ll go to the effort to export data from Excel to use with gnuplot rather than put up with the limitations of Excel’s plotting feature.

I’ve dabbled with R because of this site. I am not a statistician, and I know I never will be. But R looks like it can be used for lots of non-stats visualization, so I’ll continue to dabble. It is really hard to beat its plotting tools, so it is clearly worth the investment in learning curve. Once I finally grok its worldview, it will probably displace most of my uses for gnuplot.

I did finally satisfy my longing for something that approached the simplicity of MathCAD from its early days. I found the Euler Math Toolbox, “a powerful, versatile, and open source software for numerical and symbolic computations” according to its home page. I like that it can handle both numeric and symbolic calculations since I haven’t had to figure out an integral by hand since college. It fits the bill for my typical use case (which is laying out an annotated calculation justifying a component selection) quite nicely. Typically, R would be seriously overkill for that kind of thing.
- Not sure
  
  Posted Jan 9, 2009 at 10:47 AM | Permalink
  
  Re: Ross Berteig (#38), I used MathCAD to do my homework in college, and loved it. I have very bad handwriting, to the point where I myself would have trouble deciphering what I had written sometimes. MathCAD made me look tidy.
  
  I also looked at it recently and cringed at the price. Not in the ballpark for the naive dabbling I’d want to do. Thanks for the link to the Eular Math Toolbox. It looks interesting.
David Stockwell

Posted Jan 9, 2009 at 2:37 AM | Permalink

I would use R more, but its problematic for building web applications.

Firstly, my provider won’t let me install it.
Secondly, running with apache as a script is not well supported.

I use php alot now. If you want to use vectors, you can.
Functions like array_map and array_fill allow it.
Maths libraries exist like http://www.phpmath.com exist for stats.
Mike D.

Posted Jan 9, 2009 at 3:30 AM | Permalink

R is the open source version of S which was invented by the now defunct Bell Lab. S was sold to one company and eventually became a commercial product of Insightful Corp. The latest version is S-Plus 8, I think. But like many others I migrated to R years ago because of the cost (free) and the tenacity of the R shareware community. R is primarily a statistics tool, and the R library of packages have every stat function ever invented. It is the cat’s meow for data mining.
David Wright

Posted Jan 9, 2009 at 4:54 AM | Permalink

System like R, S, Matlab, and Mathematica are great and fill important niches. But people who claim that this makes languages like FORTRAN obsolete are making fools of themselves. Even the best mathematical C libraries are not as fast as the best mathematical FORTRAN libraries.

To cite just one example: if you have a massive, high-dimensional system of partial differential equations that you need to turn into a ridiculiously large matrix with a special, obscure structure and solve on a parallel supercomputer, you would be crazy to do anything other than use FORTRAN libries that have been under development and heavy use for 30 years. That is, by the way, precisely the sort of problem that GCM climate modelers face.

So by all means, for everyday analysis of small data sets, use whatever package makes your life easiest. But don’t delude yourself into thinking you’ve found the be-all, end-all solution to all 21st century scientific programming problems.
- Dave Dardinger
  
  Posted Jan 9, 2009 at 7:48 AM | Permalink
  
  Re: David Wright (#43),
  
  Well, if speed is that crucial, and climate modeling all that crucial, we might as well go whole hog and have special chips devised to maximize things. Expensive yes. But not as expensive as eliminating carbon fuels by a few orders of magnitude. Cheap insurance.
- MarkB
  
  Posted Jan 9, 2009 at 9:17 AM | Permalink
  
  Re: David Wright (#43),
  
  The example you give – the super-duper supercomputer problems – is at least as much a “niche” as statistical analysis or graph generation. As far as this site is concerned, I think it’s fair to say that FORTRAN is obsolete. Would you teach someone FORTRAN to do statistical studies of paleoclimate or comtemporary global temps? There was a time when you would have. That time seems to have passed. No doubt FORTRAN will live on – just not in this particular world of study.
- Not sure
  
  Posted Jan 9, 2009 at 1:08 PM | Permalink
  
  Re: David Wright (#43),
  
  But people who claim that this makes languages like FORTRAN obsolete are making fools of themselves. Even the best mathematical C libraries are not as fast as the best mathematical FORTRAN libraries.
  
  Are you sure?
  
  8) Are prebuilt Fortran77 ref implementation BLAS libraries available?
  
  Yes, you can download a prebuilt Fortran77 reference implementation BLAS library or compile the Fortran77 reference implementation source code of the BLAS from netlib.
  
  Note that this is extremely slow and thus we do not recommend it: you should use optimized BLAS whenever possible, see FAQ 5.
  
  So what is optimized BLAS written in? It’s hard to tell. It looks like ATLAS is written in C, at least partially. The AMD optimized implementation appears to be written in assembly (scroll to the bottom.)
  
  Someone asked about custom processors for numerical applications. Manufacturers introduced extensions to general-purpose CPUs called SIMD (single instrucion multiple data) to enhance the multimedia capabilities of their products. These extensions can also be used to accelerate scientific mathematics. Video cards designed for high performance 3D animation can also be used for for scientific computing.
  
  I found in looking at BLAS, CUDA, and SIMD that there are often interfaces to FORTRAN and Matlab, but seldom any mention of R. There are almost always interfaces to C and/or C++ as well.
  - David Wright
    
    Posted Jan 9, 2009 at 2:50 PM | Permalink
    
    Re: Not sure (#56),
    
    Actually, I can answer your question about netlib’s BLAS. It is written in FORTRAN, compiled, then tweaks are applied for optimization at the machine-language level. They have the warning about downloading the FORTRAN code because they want you to download the optimized binaries rather than compiling the FORTRAN code yourself. They don’t have that warning for other languges because there is no “reference implementation” in langauges other than FORTRAN.
Dan Hughes

Posted Jan 9, 2009 at 7:39 AM | Permalink

All the comments in which the FORTRAN / Fortran language is characterized as obsolete, and worse, simply display a lack of being up-to-date. I suggest tracking down and reading more recent information. The language continues to evolve in significant ways.

Hughes Law of Scientific and Engineering Computer Codes: The numbers of useful, and actually used, codes written in FORTRAN or Fortran will never decrease.

*Languages don’t write bad code, people write bad code.

**BASIC is my most favorite computer language, and GOTO is my most favorite statement.

***Real programmers use Assembly Language and 8-inch floppy discs.
Doug M.

Posted Jan 9, 2009 at 7:56 AM | Permalink

A nice discussion of low level vs. high level programming techniques and requirements on the IEEE site here. Some surprising results when comparing Matlab vs. C programs when computing 8K x 8K matrices and 32K x 32K matrices.

The summary is Fortran is still the language of choice in extreme cases where raw performance is the primary driver of success, so it could be argued that the GCMs require the code to be written in Fortran and run on today’s most powerful parallel supercomputers. I don’t know, I haven’t seen the requirements specs. The language itself is not a cureall of course. Expert parallel programming skills in Fortran would be required to properly leverage the architecture of the machines the programs run on and that’s not an easy thing to find or cultivate. However, that can hardly be the case for GISSTemp or Mann-o-matic Hockey Stick machine. I can’t imagine either one of these programs would require such enormous computing power and there’s no reason these programs couldn’t be re-written in a higher level, open source, platform-agnostic language like R.
Hu McCulloch

Posted Jan 9, 2009 at 8:51 AM | Permalink

Dan Hughes, #45 writes,

All the comments in which the FORTRAN / Fortran language is characterized as obsolete, and worse, simply display a lack of being up-to-date.

I admit I haven’t used FORTRAN since FORTRAN 77 was the norm for mainframes, and baby versions were being introduced for PCs. At the time, a vectorized version was being introduced for use on supercomputers, but PCs were becoming more convenient and adequate for my purposes, and GAUSS was a lot more powerful with them.

Back then, I was told by a top systems analyst that although he had no idea what the programming language of choice would look like in 25 years, he was certain that it would be called “FORTRAN”! I’m glad to hear the name is still alive and well!

GOTO is my most favorite statement.

Mine too! When I tried writing some Matlab 4 back in the 90’s, I was shocked to find it had no GOTO statement, and complained noisily on a Matlab user group. The Matlab people agreed to hold their noses and incorporate it into future editions, but now I can’t find it in Matlab 7 HELP. Does it have a different name, or did they take it back out?

The first machine I ever programmed for was the Bendix G-15. I don’t think the language even had a name. Then FORTRANSIT came in, FORTRAN II, FORTRAN IV, and eventually FORTRAN 77. FORTRANSIT was FORTRAN Translatable into IT, IT being the assembly language for an early 60’s IBM machine. As I recall, FORTRAN IV introduced multi-dimensional arrays, and FORTRAN 77 could handle Boolean statements.
MartinGAtkins

Posted Jan 9, 2009 at 9:19 AM | Permalink

Since I have had to go back to Windows, the best editor i’ve found is below.

Notepad++ is a free (as in “free speech” and also as in “free beer”) source code editor and Notepad replacement that supports several languages. Running in the MS Windows environment, its use is governed by GPL Licence.

Here
DAV

Posted Jan 9, 2009 at 9:29 AM | Permalink

@47: The linked IEEE site is specifically addressing parallel processing which has a different set of requirements that many readers here would likely encounter. FORTRAN is the language of choice in this area because it’s a relatively restricted language (low number of possible constructs — less freedom of expression) which makes analyzing the program for hooks into MPI and other parallel processing schemes all that much easier.

In just a little under 50 years the “definition” of low-level has undergone quite a transformation. To me, it’s a gradation from programming specifically in binary at the low end to programming in a language geared for the task at hand at the high end. I don’t consider wiring (like with the old Tab machines or even VHDL) as programming but, if it is considered so then it’s the lowest.

It should be remembered that the languages are tools and, as such, none is better than the other outside of context. Is a semi-truck better or worse than a pickup truck? Are either better than a sedan? Is a chainsaw better or worse than a rasp? R seems well suited to performing data analyses but one of the worst tools for implementing an embedded control system or network router.

Also, comparisons like speed (“Is C faster or slower than X?”) between languages are outright silly. Yes, some languages have a preference (like defaulting to p-code) that can affect speed but, in reality, it’s implementation dependent vs. an inherent quality. Also, things like “speed” need context. The execution time often contributes the least to the total time from concept to answer. A chainsaw cuts can really speed up the task of cutting logs but would be quite slow (not to mention extremely awkward) when applied to the task of cutting pieces for table chairs or building a ship in a bottle.
- BarryW
  
  Posted Jan 10, 2009 at 8:44 AM | Permalink
  
  Re: DAV (#51),
  
  You’ve hit the nail on the head. Languages are tools. The problem is that when you have a hammer everything starts looking like a nail. You can use them for a task they weren’t designed for and they’ll usually work, but are inefficient. Perl, for example, is great for tasks where text processing predominate and especially when you need to do a quick hack. I’ve done the same thing in other languages because that’s what I had available but Perl is optimized for that sort of stuff, but I wouldn’t use it to do math with even if I probably could.
  
  Unfortunately, we tend to fall back on what we know rather than utilizing the better tool. Just because FORTRAN has fast libraries does not mean it’s the right tool for the job, especially when the code is expected to be around for a long time. Even recompiling FORTRAN is fraught with risk because there is no single standard. Compilers may or may not give you the same answer from the same code, even if it compiles. Better to use a modern language and develop an API to use those libraries in a structured/object-oriented framework. Then your code is maintainable and you be less likely to recreate something that looks like Model E.
  - Dan Hughes
    
    Posted Jan 10, 2009 at 10:36 AM | Permalink
    
    Re: BarryW (#59),
    
    Even recompiling FORTRAN is fraught with risk because there is no single standard.
    
    Nope, there is a single standard for the language; how can there not be? You can look it up on the InterWebs.
    
    Different manufacturers of compilers seem to almost always add in extensions to the standard and that has the potential to introduce problems.
    - BarryW
      
      Posted Jan 10, 2009 at 12:30 PM | Permalink
      
      Re: Dan Hughes (#60),
      
      Yes, there’s a “standard” but if no one really follows it?
      
      I stand corrected though there are standards for the language, just innumerable variants that don’t conform.
  - DAV
    
    Posted Jan 11, 2009 at 11:09 AM | Permalink
    
    Re: BarryW (#59),
    
    Unfortunately, we tend to fall back on what we know rather than utilizing the better tool. … Compilers may or may not give you the same answer from the same code, even if it compiles. Better to use a modern language and develop an API to use those libraries in a structured/object-oriented framework. Then your code is maintainable and you be less likely to recreate something that looks like Model E.
    
    I’ve been at this for 40+ years. Here’s the way I view things. I’d also like to note that I am not necessarily defending the coding practices at GISS and other places but I’ve been around the block more than a few times and can sympathize with being subjected to 20-20 hindsight. It’s often too easy to forget (or be unaware of) the complex reasons driving implementations when the view is restricted by lack of experience. What’s worse is the hubris in believing that one’s own trade-offs are epitome of design.
    
    1) It shouldn’t come as a surprise that most programming projects are rarely undertaken with the expectation that one’s grandchildren will still be using the same code.
    
    2) I disagree about the lack of standard for FORTRAN. I’m currently working on a task where the last software update was nearly 20 years ago. A large portion is in Fortran 77. Even though the last version of FORTRAN I used was FORTRAN IV, I have had no trouble reading the code. FORTRAN is a relatively simple language — about one step above writing in assembly code — it’s hard to screw it up.
    
    3) Changing compiler implementations is ALWAYS fraught with potential disaster. It’s not completely unheard of for version N+1 to break code written using version N. An early version of Matlab did this. For this very reason, it’s common practice at NASA and the U.S. military to freeze the compiler version for critical projects.
    
    4) The benefit of ‘modern languages’ is primarily ease of use where the language semantics provides a service that is genuine tedium when done by hand (e.g., generation and bookkeeping of object-oriented code; vectorized operations; etc.). However, because of the increased complexity of the language, it’s often too easy to write code that has surprising interpretations. Perl has this problem, albeit Perl is more like a Swiss Army knife than a fine tool. Full checkout of any program, be it compiler or not, is an NP-hard problem. The increased complexity of ‘modern’ languages increases the risk of surprises.
    
    5) I’m curious, if one is to switch to a more ‘modern’ language every so often why should the code be ‘maintainable’? After all, under that philosophy, it’s essentially discardable.
    
    In truth, ‘maintainable’ is a fluid concept and hard to pin down. It really means “sufficiently documented”. “Maintainable” by whom? The original designers? An extremely average designer? An intern? Anyone who comes along? Each of these requires increasing levels of documentation. Going all-out can raise the cost of development easily by a factor of 3. That alone could be a show-stopper for many projects. I could argue that GISS code is ‘maintainable’ if it ever had more than one release. Everybody keeps SOME documentation. Just because it wasn’t delivered with the code doesn’t mean it’s non-extant.
    
    Documentation isn’t the only cost. When switching languages it becomes necessary to rewrite all of the code. Rewriting ALWAYS introduces the risk of getting something unintended. Not to mention the strong temptation to add ‘enhancements’ – the cost of which gets propagated to places other than code under development. Remember, full check-out is NP-hard.
    
    To me, documentation means answering the important what’s and why’s (WHY was it written THIS way? WHAT are the REAL goals to be accomplished by THESE steps?) IOW: the things that can’t be gotten from the pretty-prints supplied by doxygen and the like.
    - BarryW
      
      Posted Jan 11, 2009 at 3:54 PM | Permalink
      
      Re: DAV (#65),
      
      Remember, full check-out is NP-hard.
      
      Which is one of the reasons why the air traffic control system still has code written in JOVIAL and BAL from the sixties.
      
      I don’t remember the author (might have been Parnas) who said that even well designed code tends to deteriorate after time as it is patched and modified. Eventually it reaches a point that you should start over because the there is a high probability that for every bug you fix you will introduce another.
      
      It shouldn’t come as a surprise that most programming projects are rarely undertaken with the expectation that one’s grandchildren will still be using the same code.
      
      Some programming projects I’ve worked weren’t even expected to be in use in the next decade, let alone the next century! Yet here they are.
      
      I’m curious, if one is to switch to a more ‘modern’ language every so often why should the code be ‘maintainable’? After all, under that philosophy, it’s essentially discardable.
      
      I’m just advocating that you shouldn’t use a language because it’s the one that you’re familiar with. The last project I worked on started just as JAVA was being released. We had major arguments over whether to use C or Java. We all knew C but we finally chose Java and for our purposes it was the correct choice. For other projects it might not have been. Getting newbies up to speed on the Java code has been a lot easier than other similar projects that were as complex but written in C.
    - DAV
      
      Posted Jan 11, 2009 at 8:32 PM | Permalink
      
      Re: BarryW (#67),
      
      Java code has been a lot easier than other similar projects that were as complex but written in C
      
      I consider C one step above FORTRAN. (FORTRAN was my second language, BTW. The first was ALGOL 60). I think in an object-oriented way and prefer that the language keep track of the details. IOW: the computer should be doing the work. My outside-of-context choice is usually C++ because I can dovetail prior code written in C when necessary with little or no difficulty. Java is very similar to C++. At times, this can lead to a toss-up. But these are simply preferences. The final choice is dictated by specific requirements (sometimes by customer insistence).
      
      I know quite a few languages. Many I haven’t used in quite a while like: COBOL, Formula ALGOL, Simula, Bliss, IPL-V, Lisp in at least three variants, TREWQ and QWERT (early compiler generators), PL/1, a lot of CPU specific assembly languages, and many more. Some languages (which I won’t name) cause me to wonder why anyone thought them necessary at all — outside of ego-stroke generators, that is.
    - BarryW
      
      Posted Jan 12, 2009 at 12:48 PM | Permalink
      
      Re: DAV (#68),
      
      There seems to be a proliferation of languages since the advent of the Web which is not necessarily a “good thing”. One thing I like about Java is the numerous packages that are available and the (mostly) good garbage collection. As you say let the computer do the work.
      
      One thing about object oriented languages is that they force you into doing things the way you should have done them anyway. My former boss has managed to take our original project and pull much of the code out in reusable packages that are being utilized in other projects. It required a lot of paradigm shifts in our thinking to change our old ways, and some of that was due to the language involved.
Yudhi Baskoro

Posted Jan 9, 2009 at 9:43 AM | Permalink

Wow I think I’m such an idiot still got any figure after reading this few times
DAV

Posted Jan 9, 2009 at 10:08 AM | Permalink

The first machine I ever programmed for was the Bendix G-15

Wow! We had a G-15 in the room next to the Athena (a drum machine used for missle launches) I remember it was great for heating the room on cold winter nights. In fact, we could only run it IN the winter because the containing room had no air concitioning. We also had a Burroughs B-250 which wasn’t much better than a tabulating machine.

GOTO is my most favorite statement.

GOTO was associated with many programming sins. It was thought at one time that removing it would lead to better programming practices. Unfortunately, doing so was as much a silver bullet as prohibition. Even C has a goto.
sean egan

Posted Jan 10, 2009 at 4:05 AM | Permalink

David Wright
Absolutely the right way to go for max performance. You have source you can read and port to the latest hardware, and code optimised by the compiler, AND THEN optimised again by a human for the target environment. So what ever you started with, the critical 5% of code which CPU intensive is in fact close to or in assembler. It is important unoptimised reference model is function by function interchangable, as porting should be a regular event – for your hardware upgrades and for sharing with folks not using your particular hardware/operating system. Plus if you play your cards right, the platform may be 1 million screen savers, not a super computer.

Of course if you just want to reproduce Giss you do not need to be optimised. I do not know the actual run time, but 5 degree cells 1 to 100 stations per cells, as a guess it looks no worst than rendering a home movee. Get the biggest desktop PC in the department, and run it when you go home at night. I would be surprised if it took more than overnight. You only have to get it to complete once a month.
- Not sure
  
  Posted Jan 10, 2009 at 11:10 AM | Permalink
  
  Re: sean egan (#58), It’s simply not true that the optimized FORTRAN version of BLAS is faster than its C plus assembly counterparts. I wrote a message complete with links to performance numbers from the very authors of the FORTRAN version that showed this. WordPress decided it was spam and put it up for moderation a couple of days ago now.
  - David Wright
    
    Posted Jan 11, 2009 at 3:43 AM | Permalink
    
    Re: Not sure (#61),
    
    I’m afraid this is turning into a programming language war of religion, but it needen’t. I’m really trying to give you objectively good information here. I program much more in C than in FORTRAN, and all other things being equal prefer the former as more transparent. But there is really a difference in language design and compiler performance when it comes to high-performance scientific programming.
    
    There is no difference in speed in calling the BLAS libraries from FORTRAN or C because you are ultimately calling the same bits. But you might ask yourself why the BLAS reference implementations were written in FORTRAN and not C. Of course one reason in that what most workers in the field are used to FORTRAN. But another reason is that your average FORTRAN compiler will likely get you closer to ideally optomized code than your average C compiler.
    
    There are a number of reasons for this. One big one is that matrices are first-class objects in FORTRAN so the compiler can recognize their use and optomize it. To cite just one example of how this plays out, you will notice that the usual array of array construct (double[][]) that C programmers use to implement arrays stores them row-wise, i.e. M[r][c+1] follows M[r][c]. But conventional matrix algebra uses column vectors, so it is usually the columns of vectors that we want to operate on sequentially. FORTRAN, for precisely this reason, stores matrices column-wise, i.e. M[r+1][c] follows M[r][c].
    
    There are other examples, but I’ll stop there. Again, I’m not trying to convince you that FORTRAN is iinstrinsicly superior to C (or Matlab, etc.) I’m only trying to convince you that there are good rreasons to expect particular code constructs to compile faster under some languages than under others, and FORTRAN’s forte happens to be scientific computational constructs.
    - DAV
      
      Posted Jan 11, 2009 at 12:36 PM | Permalink
      
      Re: David Wright (#63),
      
      But conventional matrix algebra uses column vectors, so it is usually the columns of vectors that we want to operate on sequentially. FORTRAN, for precisely this reason, stores matrices column-wise, i.e. M[r+1][c] follows M[r][c].
      
      Yes, it is true that some languages are at least specified in a way that seems inefficient. Take R (and S), for example, in which parameters (like arrays and vectors) can only be passed by value which, in turn, is really inefficient if actual operation involving those parameters occurs in deep function call. There is evidence, though, the implementation of R doesn’t follow the strict language definition (which, in this specific context, is sometimes referred to as ‘lazy’ adherence).
      
      But really: think about it. Whether something is stored column-wise or row-wise is really the programmer’s viewpoint of structure being impressed upon an unstructured reality. If necessary, one implements the transpose to take advantage of compiler implementation. Often, simply changing the order of loops is all that is required. Not being able — or being unwilling — to do this demonstrates a remarkable rigidity of thought. These concept changes are necessary, in general, for EVERY language. It’s highly unlikely any given language will precisely match the preferred concept of the problem at hand.
      
      Besides, being able to express an equation or concept exactly as it appears in a math text is more efficient for coding (read ‘convenience’) than it is for resulting code speed. More often, changing the algorithm is where efficiencies are gained. Most inefficiencies occurring during code translation are in the noise by comparison. In Neapolitan’s book, “Learning Bayesian Networks,” he shows one such equation that can be quite inefficient if implemented as written but 20 times faster if viewed slightly differently regardless of language chosen.
    - Not sure
      
      Posted Jan 12, 2009 at 12:54 PM | Permalink
      
      Re: David Wright (#63), If what you say about FORTRAN compilers is true, why is the pure-FORTRAN reference implementation of BLAS so slow?
      
      I’m not trying to say that FORTRAN is a bad choice for scientific programming. I’m just saying it’s not the only choice. I don’t want anyone to think that they have to learn FORTRAN if they want to be “serious”.
      
      What you say about FORTRAN’s expressiveness for mathematical constructs is even more so for Matlab and R. That is why it’s the choice here. That is why all the folks who do stats at my workplace use either Matlab or R.
      
      Another advantage of R and Matlab over compiled languages like FORTRAN and C is that they do away with the tedious build-run-debug cycle. I’ve been trying out some of Steve’s scripts, and it took me a minute to get used to the idea that when something fails, you can modify it and re-run just the failed part. Whatever you’ve previously loaded into memory is still in memory. No need to re-run the whole thing from the beginning.
Tony

Posted Jan 11, 2009 at 5:50 AM | Permalink

I think the post by Yudhi Baskoro (#52) is SEO spam.
Mark T.

Posted Jan 12, 2009 at 3:14 PM | Permalink

The fastest FFT library, FFTW, for most processors is written in C, btw. It’s actually written using OCAML, and the resulting C-code is, in some cases, all but indecipherable. There are also a variety of assembly level header files written specifically for the most common SIMD processors. Compilers are still woefully inadequate to deal with SIMD instructions (as well as parallelization). FFTW is comparable to tuned libraries written by CPU manufacturers, which generally are written in assembly.

Realistically, with low-level languages (C and FORTRAN are low-level compared to R or MATLAB), the difference in performance comes down to compiler ability. FORTRAN is generally assumed to be simpler for a programmer to write when producing scientific code, but that does not indicate any inherent performance gain, though it may allow more efficient code which is easier for a compiler to optimize. However, compilers do display an amazing ability to “fix” bad code (though not always), which likely becomes progressively more difficult as the language becomes more abstract.

That said, when I write speed critical code, I use neither canned library functions nor C-code, but instead rely on hand optimized assembly. This is slow and painful, but ultimately allows me to fine tune performance. There’s a balance between best code for set of instructions and cache access efficiency when looping on the same instructions. What works best is not always obvious by inspection alone, either – sometimes I beat the compiler and sometimes it beats me – but writing the code in the lowest level possible (assembly or machine) nearly always provides the best performance.

Mark
Steve

Posted Jan 13, 2009 at 4:52 PM | Permalink

Regarding the R is not like C comments, the NY Times didn’t say R was like C, but rather that “R is similar to other programming languages … in that it helps people perform a wide variety of computing tasks by giving them access to various commands”.

It then talked at length about how high level R is and that it is extensible.

Seems pretty levelheaded to me.
John

Posted Jan 17, 2009 at 7:46 AM | Permalink

R is great for data analysis, I have used it for about a year now, mastering the language takes a while though, Matlab language is much more easier especially If you want to do advanced custom charts and complex scripts.

Another free toolbox that looks interesting is NIST Dataplot (http://www.itl.nist.gov/div898/software/dataplot/).
KlausB

Posted Jan 17, 2009 at 12:24 PM | Permalink

John A., Steve
thank you for this thread.

It did remind me, to take a look on R.
– I had the intention already for quite
some time, but did delay it due to
other requests.
– I’ve installed it a few days ago,
I do now create/install scripts
for all my frequently downloads.

Yeez, now I silly myself for not using it sooner.
Would have done some things much easier.

What I do like mostly, is the RODBC package,
because I usually do store all data in a database.

I had already some occurences where some things
didn’t work the way they should – according to the
manuals. Nevertheless, I found ways to got it
running.

*Would you mind, if I do entries here on my
expiriences – what didn’t work – and how to
get it working?

I’m usually doing it on Win2k, WinXP, Win2003 Server,
so my solutions are only reated th these platforms.

KlausB
KlausB

Posted Jan 17, 2009 at 12:49 PM | Permalink

How not to test a script
When developing an testing a R script,
I do it this way:

– I download from web into a local copy
– I comment out the download.file(…) line
and do all further tests from the local copy.
#
# I’ve already had epiriences with folks,
# trying to get a script working, acessing
# a ftp server in a production environment.
#
# They did work parallely on two scripts,
# accessing the same server and several
# folders, at the peaktime of production
# and they did bulk downloads.
#
# These guys did drive me mad.
#
# So, guys, let’s be nice. Even
# system engineers/system administrators are
# – in general – plain humans. Let’s avoid
# to harm them too much.
#
# — first test run, example:
#
url=”http://www.ijis.iarc.uaf.edu/seaice/extent/plot.csv”
IARCfile=”H:\\DB\\DAILY\\IARCplot.csv”
download.file(url,IARCfile,mode=”wb”,cacheOK=TRUE)
IARCbuffer=readLines(IARCfile)
#
# — all further test runs, example:
url=”http://www.ijis.iarc.uaf.edu/seaice/extent/plot.csv”
IARCfile=”H:\\DB\\DAILY\\IARCplot.csv”
#——-download.file(url,IARCfile,mode=”wb”,cacheOK=TRUE)
IARCbuffer=readLines(IARCfile)
#
KlausB

Posted Jan 17, 2009 at 1:34 PM | Permalink

Some problems with RODBC package
#
# I did my very first trials with it,
# trying to insert/update Excel-tables.
#
# Reading is no problem.
# Writing is problematic.
#
#
# channel=odbcConnectExcel(xls.file=xlsfile,readOnly=FALSE)
# doesn’t work
#
# looking in Settings>Control Panel>Administrative Tools>´Data Sources (ODBC)
# – a User DSN for Excel Files is there
#
# – did create a System DSN for Excel
# !!! take care, that Read Only Connection isn’t checked.
# Doesn’t work either.
#
# Finally, I did create a File DSN and configured it to
# be linked to an sheet named “dummy.xls”
# It’s an empty sheet, always copied from a template folder.
# This did work.
#
# Why did I do it this way:
#
# sqlQuery(channel,”drop table ArcSeaIce”)
# didn’t work, I always got an error.
#
# Strange: the deleting of table content did work.
# So I always have to start with a .xls-file which
# doesn’t have already the table – even empty – in it.
#
# example for how it does work now:
#
#–I created a File DSN named RtoDummy, linked to “dummy.xls”
ToExcel=data.frame(IARC)
channel=odbcConnect(dsn=”RtoDummy”)
sqlSave(channel,ToExcel,tablename=”IARC”,nastring=”NA”,colnames=FALSE,rownames=FALSE,safer=FALSE,fast=TRUE)
#
# Don’t forget: the RODBC has to be installed before
# and: we need to refenrce it:
library(RODBC)
KlausB

Posted Jan 17, 2009 at 1:45 PM | Permalink

Steve, John A.

I’ve installed R-2.8.1
Can’t find a ncdf – package which is up to date.
Suggestions?
klausb

Posted Jan 22, 2009 at 11:45 AM | Permalink

more online help and tips and tricks here:
klausb

Posted Jan 22, 2009 at 11:47 AM | Permalink

sorry, here:
http://wiki.r-project.org/rwiki/doku.php
Not sure

Posted Jan 23, 2009 at 5:25 PM | Permalink

The StatET plugin for Eclipse is very cool indeed.