Analysing C# code on GitHub with BigQuery (mattwarren.org)

109 points by matthewwarren 128 days ago

85 comments

tybit 128 days ago

1. ArgumentNullException, 2. ArgumentException and 5. ArgumentOutOfRangeException shows how badly C# needs more ability to specify/enforce contracts.

It's awesome that C# 8 will address the number 1 problem, with non nullable references but number 2 won't be addressed until records post C# 8 by the looks of things and number 5 not until exhaustive pattern matching which doesn't look to be on the cards at all AFAIK.

    stult 128 days ago

    Seriously. Contracts. I inherited a relatively substantial C# WCF business line app with a large number of classes each with many nullable properties. The null handling was awful, but there were too many classes to easily refactor to apply data contract attributes across the board (at least given that I am the sole available resource to do so). Instead I tried selectively applying the null object pattern to some of the most common troublemakers. It's improved things but god I wish the original dev had used data contract attributes from the beginning.

    Similarly, it's staggering the number of times I've seen someone wrap their entire method (or worse yet their entire program) in empty try-catch blocks just to avoid responsibly handling null and out of range exceptions. To me, it seems like the ubiquity of this antipattern indicates that there's something inadequate about the language, tooling, or developer culture. Or some combination thereof.

    I'm not enough of an expert to say what the problem is or how to fix it, but in my relatively limited experience I've found requiring data contracts up front smooths over a lot of the pain points. But it comes at a cost. I've never used an equivalent in any other language, so I don't how C# and .NET compare, but it does seem like a lot of boilerplate code that can limit decoupling if misapplied.

      emodendroket 128 days ago

      Well this is the problem Option classes are meant to solve but it's pretty tough to make that work without breaking all the old code where any reference value can be null

    matthewwarren 128 days ago

    Good point, yeah all the instances of 'ArgumentXXXException' points to there needing to be better language/tooling support for it.

nawtacawp 128 days ago

C# developers overwhelmingly prefer putting an opening brace { on it’s own line (query used)

Putting it on it's own line seems like a waste of space to me and doesn't contribute to the readability.

    partisan 128 days ago

    If I am not mistaken, Visual Studio automatically places an opening brace on the next line. When setting up coding standards, I asked my team not to fight Visual Studio. It generally has good auto-formatting functionality, but even more important is the uniformity and consistency of the code.

      shoover 128 days ago

      Opening brace on a new line is the default, but VS has excellent granular settings in this area. I was tired of having different standards for C and C#, so at one point we switched our defaults and ran a utility to convert all our C# code. We use a new line for namespaces, classes, and methods and the same line for everything else. VS handles it flawlessly and we haven’t looked back. Thinking back, I have not seen any inconsistencies committed in that time. VS autoformat is that good, with braces and white space.

        omgtehlion 128 days ago

        > a new line for namespaces, classes, and methods and the same line for everything else

        accidentally, we arrived to the same code style for both C# and C++...

          shoover 128 days ago

          Probably accidental here, too, in that the C code was older and larger. There was no good answer for the question of why C# was different, so the change was made.

          The only downside would be if I worked on someone else’s project with different standards. I don’t think VS can figure out autoformat at the project level, like, say, python mode in emacs.

            nwah1 128 days ago

            Have you looked at editorconfig?

              shoover 128 days ago

              I hadn't seen it before, but that looks great. It has such a wide variety of plugins. Thanks for the tip.

    emodendroket 128 days ago

    Are we really going to derail this thread to argue about this? Everyone has their preferences but the most important thing is just picking a style and this is the one VS defaults to.

    johnlbevan2 128 days ago

    I used to agree, but since adopting the Allman style have found that in many cases it's very helpful in making code readable (especially if using an editor which doesn't provide any auto-format options).

    That said, some languages are impacted by style choice; e.g. JavaScript. Given many of us have to develop in multiple languages, it would be great to select a style supported by all languages (better would be if all languages supported all styles).

    https://stackoverflow.com/questions/12233999/is-it-true-that...

    jimktrains2 128 days ago

    To waste something there needs to be a finite amount of it in the first place.

    Considering the large number of coding standards that mandate the placement on a new line, tossing out readability because it doesn't matter to you seems short sighted.

      doodpants 128 days ago

      There is a finite amount of vertical space on one's screen, and thus a finite number of lines of code that can be displayed simultaneously without scrolling. Reducing the amount of scrolling that needs to be done to view the entire body of a method can itself be an improvement to readability.

        jimktrains2 127 days ago

        I would argue that fitting the most lines on the screen as possible is not a noble goal. Whitespace is very important for readability.

    bayonetz 128 days ago

    I find it easier to visually match the start and end of the blocks that the brace pair defines. Having said that, my team uses the opening brace on same line style and that's fine too. Since I use an IDE, it doesn't matter much either way because It provides great features like block folding and auto highlighting of a matching brace.

    Don't understand why we spend so much time arguing about such low stakes stuff as this.

    mrighele 128 days ago

    Beauty is in the eye of the beholder. For example I used to put the opening brace on a line of its own, mostly for simmetry reasons ( `{` in the same column as the relevant `}` ).

    By the way, after many years I had to change it for coding style reasons. Now when I look back to old code and I see that lonely `{` my reaction is "ewww", so I guess it is also a matter of habit.

    mda 128 days ago

    I know this is a matter of preference and habit, but I also cringe when I see this style, makes it less readable for me as well.

    colejohnson66 128 days ago

    It’s the Visual Studio default. It definitely improves readability for me sometimes. Having empty lines makes it easier to read.

    mercer 128 days ago

    Perhaps it makes copy/paste easier? I have a tendency to put the '[' of an array on a separate line for that reason, for example.

mulrian 128 days ago

Interesting to see 'var' isn't used as much as I thought it would be.

    matthewwarren 128 days ago

    Yeah, I thought the same.

    I now realise the better test of 'var' would be to compare it to how many times it's not used. That is count the times that 'var myClass = new MyClass()' is used v 'MyClass myClass = new MyClass()'. But my regex skills aren't good enough for that and I'm not even sure if you can do it purely in a regex?

    Also I imagine that 'var someNumber = 1' is used less that 'int someNumber = 1', but again, I'm not sure how to do the regex? I certainly only use 'var' when the same word is on both sides of the '=', so not in the case of int/double/decimal etc.

      mattmanser 128 days ago

      I still use var in that scenario, because if you need different types you have to suffix numbers with an identifier anyway:

          var i = 1; //int
          var j = 1f; //float
          var k = 1m; //decimal
          var l = ""; //string
          var m = ''; //char

        lukepothier 128 days ago

        There are a few more for numeric types:

          var integer = 1;
          var long = 1l;
          var float = 1f;
          var double = 1d;
          var decimal = 1m;
          var unsignedInteger = 1u;
          var unsignedLong = 1ul;

        matthewwarren 128 days ago

        Yeah, I know you can do that, but I can never remember which suffix is which ;-)

    weavie 128 days ago

    Back when I used to work at a large enterprise the use of 'var' was banned in our coding standards.

    Using Linq was heavily frowned upon. It took quite a few years before it started to be accepted..

    At one point we were also strongly encouraged to place a #region around every single method! I never understood why, and luckily that one did get dropped.

      romanovcode 128 days ago

      What's the point of not using stable language features? I don't get it.

        Cakez0r 128 days ago

        I've (unfortunately) had to work with this rule before. The justification was that it makes the code more readable.

        I can understand banning SQL style linq and allowing only functional style, but I guess that's more of a personal preference.

        kylereeve 128 days ago

        I used to work in a company with a similar policy. Never got a satisfactory answer, I think there's just lots of FUD around any language feature that looks different than what people are used to.

        recursive 128 days ago

        goto is a stable language feature.

        JustSomeNobody 128 days ago

        While it shouldn’t be banned, it should be used appropriately. We have some areas of our project that communicate with hardware where we cannot use it because of the performance hit.

          tigershark 128 days ago

          Why on earth? It is resolved by the compiler, there is absolutely no performance difference between using var and explicitly specifying the type.

            ambulancechaser 128 days ago

            undoubtedly that person was talking about LINQ and not var usage

              tigershark 127 days ago

              Undoubtedly he didn't read properly then. The banned feature was var, linq was just heavily frowned upon.

              dbattaglia 128 days ago

              Or possibly confusing "var" and "dynamic"?

              JustSomeNobody 128 days ago

              Yes, LINQ not var.

                btschaegg 128 days ago

                Well, that on the other hand does indeed make sense, depending on the context. The only question there could be if it's a good idea to use a garbage collected language to directly[1] interact with hardware.

                [1]: Well, at least directly enough for LINQ performance to be a problem.

                  JustSomeNobody 128 days ago

                  We're not real time, but we do have some time constraints.

                  Yes, all this was kicked around and argued about.

          romanovcode 128 days ago

          Yeah, I agree. You can make some seriously shitty unreadable code using LINQ just to look smart. But that's what code reviews are there for.

            btschaegg 128 days ago

            I agree. That, and not using LINQ often enough doesn't resolve that problem per se. Rethinking the problem (or the way you attempt to solve it) might very well improve the code much more.

            One of Kevlin Henney's talks references a really neat quote by Poul Anderson on this subject:

            > I have yet to see any problem, however complicated, which, when you looked at it in the right way, did not become still more complicated.

            emodendroket 128 days ago

            Sometimes ReSharper makes some seriously wild, unreadable suggestions for converting code to Linq.

              romanovcode 127 days ago

              Yeah, but IMO after Roslyn I don't see the benefit of ReSharper, definitely not for the 100 dollar price per year.

                emodendroket 127 days ago

                That happened right when I started a new job so I haven't touched it that much. R# used to be absolutely essential though.

          lightbyte 128 days ago

          Are you talking about "var"? Because that shouldn't affect your performance, this generates equal IL code:

          var x = 10;

          int x = 10;

      darklajid 128 days ago

      Yesterday a colleague wanted to compile _and deploy_ code. Checked it out. It .. failed to compile. Sure enough, he fixed it (and checked the change in) before deploying it (test env, thank god).

      The code was generating some JSON using Linq and created JSON properties: new JsonObject(LinqExpressionHere)

      The name for the property was defined as

      $"SomePrefix_{x.Name}"

      which he helpfully changed to

      "SomePrefix_{x.Name}"

      which then compiled, but crashed in the test environment because of - surprise - duplicate attributes..

        maxxxxx 128 days ago

        I just did this in my code! Ported a ton of PHP to C# and forgot to add $ to the front of strings. A compiler warning would be nice for this.

      matthewwarren 128 days ago

      > At one point we were also strongly encouraged to place a #region around every single method! I never understood why, and luckily that one did get dropped

      Shudder!!

      Especially since VS has plugins (or maybe it's built-in?) that will let you fold/hide individual methods, if you really want to do that!

    mattmanser 128 days ago

    I use it religiously now, so much quicker to program with imo as you don't need to find out the return type. Surprised it's less used than async/await though, which I think MS massively overhypes the usefulness of (and is annoyingly making a lot of classes have async behaviour by default which can result in very weird random hangs).

    Although I imagine a significant number of class declarations would contain no var or async/await simply because of their nature.

CDillinger 128 days ago

Interesting how 'css/style.cs' is number 10 on the size list. Is this just a style sheet with the wrong extension? I always assumed GitHub's language detection was based on syntax rather than just extension. Does anyone know if this is the case?

    matthewwarren 128 days ago

    I don't know about what's displayed on GitHub.com, but the dataset I queried only looks for a '.cs' extension, so there's a chance that some non C# files got in. The dataset is here https://bigquery.cloud.google.com/table/fh-bigquery:github_e...

    Fortunately, most of the queries I've done are aggregations or looking for C# syntax, so there's only a few places non C# code could get in (I already filter out binary files, which I noticed earlier)

emodendroket 128 days ago

I'm surprised to see so little var, especially when ReSharper by default scolds you for not using it.

sAbakumoff 128 days ago

It's always nice to see BQ-based research like this!

recursive 128 days ago

> Finally, an interesting list is the top 10 using statements that aren’t System, Microsoft or Windows namespaces:

> using System.Web.Mvc;

But... that is a System namespace.

    matthewwarren 128 days ago

    LOL, yeah I completely missed that one!!

    I've just updated the post, 'log4net' makes in into the top 10 now.

codeulike 128 days ago

using UnityEngine;

Is the second most used 'using' that's non-Microsoft (after Nunit). Shows how important Unity is for C# take up. And presumably most people using Unity to write games are not going to be sticking their code on GitHub either so this must be a fraction of the real use.

    throwawayReply 128 days ago

    IT's important to open source C# take-up, but I'd wager the vast majority of C# usage is closed-source shops.

    c# just isn't favoured by hobbyists except for game-programming. But in business world (at least from my UK perspective) where nothing comes close to github, c# still seems as healthy as ever and that doesn't depend on unity at all.

      codeulike 128 days ago

      Oh yes for sure, but that's always been there. .NET has always been used widely for business internal stuff and that's what kept it alive. But the recent C# renaissance seems to be about app building stuff like Xamarin or Unity.

Nucode 128 days ago

I’m really confused why you think #regions are terrible. I use them generously to collapse sections, and they objectively make code more organized. I would not be as productive without them.

oaiey 128 days ago

It always disappoints me as a C# dev is the lack of top level application products like WordPress and friends. All the top lists in Matt's post are either infrastructure or development tools.

    mattmanser 128 days ago

    Iguess it's the pre-compilation requirement. Might change with the new version, but I think the inertia was lost for the language long ago.

    Here in the UK C# seems a lot more popular than Java, but I must admt I'm getting more and more js jobs these days, which fills me with a slight dread for the future as I really dislike programming in it.

    I think Unity has helped C# become more popular for the masses again.

      oaiey 128 days ago

      I am not worried about C#. It just lacks open source applications. There are tons of poster boy closed source applications :)

      201709User 128 days ago

      There are more JS vacancies because nobody wants to touch that shit.

    matthewwarren 128 days ago

    Hmm, yeah that's a good point. I'd not looked at the individual repos too much, but you're right, there's not many (popular) products written in C#.

    However, I guess my lists are limited by what is on GitHub, Paint.NET springs to mind, but it's not open source.

      oaiey 128 days ago

      Yeah, paint.net also crossed my mind. Would not even show up. Too "small".

    codeulike 128 days ago

    I suppose its because people would have had to host a C# Wordpress on Windows servers instead of nice cheap Linux ones. Maybe that will change going forwards with .NET Core etc

      throwawayReply 128 days ago

      And not just the expensive windows server but for a long time people considered windows builds of postgres / mysql not production-ready (How much of that was fud and how much truth I do not know), so typically any DB solution is Sql-Server which is another significant expense.

        pjmlp 128 days ago

        They still do.

        We have customers running Linux instances just for Postgres.

    colejohnson66 128 days ago

    I’ve always assumed that’s because C# used to not be cross platform (yes, there’s Mono, but that isn’t as popular as it should be)

    EnderMB 128 days ago

    It's not as application-heavy as a lot of other platforms, but there are definitely some great applications built on top of .NET.

    If you're looking for a CMS, you'd be hard-pressed to find a better open-source CMS that runs on ASP.NET than Umbraco.

      oaiey 128 days ago

      Not looking :)

    tybit 128 days ago

    I completely agree, it is disapointing. On the plus side, I'm hopeful that now that .net (dot net core at least) is OSS, cross platform and maturing things will start changing for the better.

joepour 128 days ago

Surprised there wasn't more Class1.cs