CODE-REVIEW, STATIC-ANALYSIS

Semgrep: Writing quick rules to verify ideas

When you want to quickly grep for something but the pattern is too elaborate, Semgrep comes in really handy. It’s a static analysis tool that has a lot of great use cases, but one usage I don’t hear about often is quickly writing disposable rules to validate an idea when reviewing code. So that’s what we’re going to do here!

Cross-site request forgery (CSRF) on GET requests

Most mature web applications and frameworks will handle CSRF protections on POST/PUT/DELETE requests automatically, however GET requests are not supposed to do any state changing actions and have no CSRF projections. That’s where errors can slip in1! To quickly check for GET CSRF I like to grep through all the GET (or even HEAD) routes and look for action words like create, update, delete, etc. It’s a basic heuristic but it works well enough to catch mistakes and low-hanging fruits.

Ungreppable patterns

In some frameworks, like Ruby on Rails for example, route definitions are mostly one-liners:

get 'profile', action: :show, controller: 'users'

However some patterns are more complicated like this example from Kibana:

  router.get(
    {
      path: '/internal/app_search/log_settings',
      validate: false,
    },
    enterpriseSearchRequestHandler.createRequest({
      path: '/as/log_settings',
    })
  );

This is where Semgrep will help.

One might say that the code snippet above isn’t too bad and could be grepped if we included some newlines in the regex, however the path isn’t always in the same place and a Semgrep rule is much more reliable.

Workflow for building a rule

I want to match routes defined as in the snippet above where the first path sounds like a state-changing action.

Knowing the tool

The first part of the workflow is to actually know the tool you’re working with! Read the documentation about writing rules so you know the features at your disposition. From reading the documentation, I know that metavariable-regex is going to be useful to me here.

Using the playgroud

Semgrep.live is a playground where you can quickly test your rules with the latest version of Semgrep from the comfort of your browser. (Note: I wrote this a few months ago and the editor doesn’t look the same anymore! The workflow still works, don’t worry about it)

Let’s start a new rule by setting TypeScript as the target programming language and pasting the Kibana code from the beginning of this blog post.

What I’m lookin for here are path: "something" patterns inside a router.get(...) call so I will express that in semgrep terms. The semgrep code is very close to the sentence I just wrote!

It matches both occurences of path but that’s perfectly fine. Here’s a quick breakdown of how the rule works, but really, read the documentation. :)

We’re almost there already! The most important part is missing however, actually matching only on action names that “sound” state-changing. To do this, let’s switch to the Advanced tab of the playground. While I’m there I’ll give a meaningful id to my rule and will set languages to be TypeScript and JavaScript because both are used in Kibana.

This is where having read the documentation (have I mentioned that already?) is going to pay off, otherwise things might start looking a little cryptic. The playground is now showing the YAML representation of the rule I was writing over in the Simple tab. A few things to take note of:

For the last part, I want Semgrep to find action words in the last segment of the path present in $PATH.

The documentation for metavariable-regex mentions the following:

The metavariable-regex operator searches metavariables for a Python re compatible expression. This is useful for filtering results based on a metavariable’s value. It requires the metavariable and regex keys and can be combined with other pattern operators.

This is precisely what I’m looking for. metavariable is $PATH and regex is ^.*/[^/]*(create|update|delete)[^/]*$ (see it in action on regex101 if you’re not super comfortable with regular expressions yet).

It didn’t match anything in my code snippet (which was expected) so I added another one with a made-up vulnerable pattern to validate that it works.

To polish things up I changed the message to message: Check $PATH for GET CSRF and Semgrep will replace the value of $PATH with the actual path in the output.

This is what the final rule looks like:

rules:
- id: kibana_get_csrf
  patterns:
    - pattern: |
        path: $PATH
    - pattern-inside: router.get(...)
    - metavariable-regex:
        metavariable: $PATH
        regex: ^.*/[^/]*(create|update|delete)[^/]*$
  message: Check $PATH for GET CSRF
  languages: [ts, js]
  severity: WARNING

My real rule has more words for the “action word” regex but I leave that as an exercise to the reader.

Use your rule

Now that the rule is written, it’s time to use it! Save the rule in a file and run Semgrep (output slighly trimmed to keep only the relevant bits):

$ semgrep scan --config kibana_get_csrf.yml --metrics off
 x-pack/plugins/enterprise_search/server/routes/app_search/curations.ts
     kibana_get_csrf
        Check '/internal/app_search/engines/{engineName}/curations/find_or_create' for GET CSRF


        110┆ path: '/internal/app_search/engines/{engineName}/curations/find_or_create',
          ⋮┆----------------------------------------
     kibana_get_csrf
        Check '/as/engines/:engineName/curations/find_or_create' for GET CSRF


        121┆ path: '/as/engines/:engineName/curations/find_or_create',


 x-pack/plugins/enterprise_search/server/routes/workplace_search/sources.ts
     kibana_get_csrf
        Check '/internal/workplace_search/sources/create' for GET CSRF


        924┆ path: '/internal/workplace_search/sources/create',
          ⋮┆----------------------------------------
     kibana_get_csrf
        Check '/ws/sources/create' for GET CSRF


        942┆ path: '/ws/sources/create',

And we have two findings! The second one isn’t a CSRF, it’s part of an OAuth flow and there’s a CSRF token passed as a query parameter but the first finding was indeed a real CSRF. It was reported to the Elastic bug bounty program and was fixed in version 8.4.0 by changing the route to require POST.

Conclusion

With this quick walkthrough I hope that you can feel more confident to start using Semgrep. It’s a really fun tool that sits in between grep and more in-depth code analysis tools like CodeQL. Their support for certain languages isn’t quite there yet, but report all the bugs you find and contribute to making it better. There’s an easy link to report bugs in the playground. I’ve reported a few myself and the team is always super helpful.

PS: I know I kind of sound like I was sponsored to write this, but I swear I’m not affiliated with Semgrep or the company making it (r2c). :)


1: GraphQL APIs can be another interesting vector for CSRF but I won’t cover that here